Minutes to EDRN Data Sharing and Informatics Call 2025/03/17

EDRN Data Sharing and Informatics Call

Monday, March 17, 2025

Present (in BOLD):

NASA/JPL: Dan Crichton, Sean Kelly, Heather Kincaid, Ashish Mahabal
Arizona State University: Ji Qiu
Boston University: Jennifer Beane
EVMS: Julius Nyalwidhe
DMCC: Jackie Dahlgren, Royce Malnik
Johns Hopkins: Zhen Zhang
Moffitt: Yoga Balagurunathan
NCI: Amanda Skarlupka, Guillermo Marquez, Christos Patriotis, Juan Miguel Villanueva
PNNL: Tao Liu
University of California: William Hsu
University of North Carolina: Kristen Anton

Current Action Items:

DONE: JPL will send updated FAIR guidance for review and feedback from the group. Update: This information will be published soon.
JPL to 1) populate the grid of Roles and Responsibilities for FAIR-based data presented on the call to the Public Portal, 2) document more, and 3) promote investigator trainings to ensure that NCI policies are followed. Update: Heather will send links.
PIs are asked to review their data in LabCAS and let JPL know of any issues.
DONE: Discuss a roadmap for additional hackathons and workshops. Update: want to have this every other year, have groups bring in AI tools and capabilities—have one this year, and feed into next EDRN Workshop.
DONE: JPL to review EDRN FAIR Data Guidance Page and Training for each Collaborative Group on next call.

Agenda/Discussion:

Posting of guidelines for data preparation and submission:

This information can be found on the EDRN Website at EDRN/Data and Resources/Informatics/LabCAS Help/ at LabCAS Help — Early Detection Research Network There is a lot of information regarding steps to submit data and FAQs. Want to make it as simple as they can. JPL is happy to have a call with sites to walk them through uploading data.

Working with NCI to make sure that the FAIR (Findable, Accessible, Interoperable, Reusable). JPL wants to apply good metadata and that the data is captured in structures that people can find and use. Please send questions/comments about this to JPL.

Recommendations for creating AI Ready Data:

Ashish Mahabal noted that EDRN has a lot of datasets but not all of them are uniform and not all of them have metadata and/or required fields. The goal is to make more data sets ready to use with additional datasets to do even more. When reviewing the data holes (missing data) are often found, and documentation of the missing data is very important, and should include well organized metadata. The publications related to the study should also be listed.

Future hackathons and opportunities for working with EDRN data:

Ashish Mahabal said that the August 2024 hack-a-thon was more of a forum for ideas, and the result were 3 different projects, which are still being worked on. In the future, he would like to see more datasets, and once there are more organs for which there are datasets, the methodology used for one organ can be applied to the other organs, which would allow us to go beyond what the ideas were initially, which will provide more ways to use the data. However, there are “hallucinations” that can occur, and this needs to be avoided at all costs. One way is to create agentic–based workflows for people who do not have in-depth knowledge about how to query data such as prompt-based FAQs where the set up will know if certain questions are asked and what kind of answers should be given. This will allow for basic analysis.

The next steps are to have another Hackathon with a full AI workshop in 2026. JPL wants to organize a team that can discuss hackathons and where to go. The goal is to incorporate lessons learned and to have new AI-ready data sets for EDRN. Dan Crichton asked the group to let him know if they have datasets that could be transformed into an AI-ready dataset. JPL will work with you on this.

William Hsu asked about the hackathon—be intentional and strategic and link back to EDRN data. Use the data and develop new science questions to pursue.

DICOM Header Standards Working Group Call Update:

Working towards defining a minimal metadata for headers after the deidentification process such as tags. Using TCIA guidelines. Will refer to SOPs for each study. Actions:

Gather feedback on EDRN de-identification process and Core DICOM Tags
Identify a test use case to evaluate the process and collect feedback.
Next meeting is on the 4th Thursday of the month at 12pm PT/3 pm ET. Let JPL know at

May EDRN Scientific Workshop Plans - 4 Posters:

One on FAIR Guidance, how it relates to LabCAS data submissions.
EDRN Data Resources Available (EDRN Knowledge Environment)
LabCAS – background and support as a cancer biomarker data commons with AI support
AI/ML Workshop/Hackathon Progress Updates

There will also be a forum/discussion on AI readiness and related topics. Ziding Feng and Dan Crichton will lead the discussion.

Other:

Non-imaging data: Yoga Balagurunathan asked about being able to use this for AI. Dan Crichton said that there might be an opportunity for this. Christos Patriotis said that the data is very limited—if data is being used for discovery, there is room for this, but for validation studies, this may not be an option because of unblinding issues. Zhen Zhang also talked about bias in the data that should be considered. Ashish Mahabal said that there are models that can help with this. William Hsu said that any harmonization that has occurred should be documented. There are also tags that indicate how the images were acquired which could be helpful.

Next Call: Monday, April 21st at 1pm Eastern/10am Pacific.