Minutes to Data Sharing and Informatics Subcommittee Meeting 2025/07/21
Present (in BOLD):
- NASA/JPL: Dan Crichton, Sean Kelly, Heather Kincaid, Ashish Mahabal
- Arizona State University: Ji Qiu
- Boston University: Jennifer Beane
- EVMS: Julius Nyalwidhe
- DMCC: Jackie Dahlgren, Royce Malnik
- Johns Hopkins: Zhen Zhang
- Moffitt: Yoga Balagurunathan
- NCI: Amanda Skarlupka, Guillermo Marquez, Christos Patriotis, Juan Miguel Villanueva
- PNNL: Tao Liu
- University of California: William Hsu
- University of North Carolina: Kristen Anton
Current Action Items:
- ONGOING: PIs are asked to review their data in LabCAS and let JPL know of any issues.
- Discuss roadmap for additional hackathons and workshops.
Agenda/Discussion:
- Update on DICOM Headers WG
- Define a minimal required DICOM header fields that make cancer biomarker imaging data reusable and AI-ready while minimizing the burden on participating sites. This will support future standardization and help guide SOP development for EDRN studies.
- Status:
- Working with subgroup of EDRN DICOM Researchers which meets regularly—JPL will schedule another meeting.
- Drafted a core set of DICOM header fields that will span modalities
- JPL to compare the proposed core tags against current LabCAS image tags to assess alignment and gaps.
- Working drafts for review: the documents below are ready for review by the subcommittee members. The documents below will be posted to the EDRN website after they are approved.
- Draft EDRN DICOM de-identification Process https://docs.google.com/document/d/17-NupQmnCPbsK030qGdLM_VJ3OS5GWZ-pXaB5U_y4Zw/edit?tab=t.0#heading=h.x0xliewwlt6w
- Draft EDRN DICOM Header Tags for Review https://docs.google.com/spreadsheets/d/1Q56vKzK0nB4UAkfLJnBOy6C-7wtHccvZkWYGQHTMpBw/edit?gid=151094300#gid=151094300
- Right now JPL is testing to identify gaps before the drafts are approved. P-MRI is one of the case studies being used to create these SOPs. Jackie Dahlgren said date deidentification is a hot topic. She noted that for P-MRI and other studies we have IRB approval to get these dates but not to share them publicly. A script could be developed to deidentify the date when images are shared publicly. JPL could raise this with the DICOM working group. These rules should apply across all consortia once interagency agreements are in place.
- Update on the LabCAS FAIR Data Holdings and Priorities
- Completed training with the EDRN collaborative groups on FAIR data practices.
- Posted FAIR submission Guidance, help pages and documentation templates to EDRN public portal.
- FAIR upgrades underway for existing datasets starting with BBD and DCIS image collections using standardized templates to update metadata, README files and supporting documentation.
- Discussed LabCAS holdings: Heather Kincaid showed a slide that outlined LabCAS data holdings.
- Caltech Summer Intern Projects on Biomarker Standards and AI
- Ashish Mahabal discussed this program where students worked on this for 10 weeks on the EDRN Biomarker Database. The items that were covered in this program are listed below.
- Verbose metadata
- Example-DCP
- Field mapping
- AI readiness of datasets—breast cancer dataset from BRSI as example.
- Ashish Mahabal discussed this program where students worked on this for 10 weeks on the EDRN Biomarker Database. The items that were covered in this program are listed below.
- AI Readiness Standards: EDRN has datasets on various organs that ate supposed to be complete. One of our aims is to gauge their AI-readiness, which means that one or more of these data sets can be used for additions to existing projects or new project altogether. Ashish Mahabal gave an overview of the breast dataset collections from Dr. John Heine and Dr. Erin Fowler which featured 2D and 3D imaging collections. The lessons learned so far include:
- AI ready data implies datasets that can be used for purposes that may go beyond original intent.
- Data types could be images cubes (e.g., CT, MRI, etc.) images (e.g. tissue), tables (e.g. clinical data) or a combination. The data should be complete in some fashion.
- The data should also be without holes, or the missing data should be well-documented.
- Documentation should also include well organized metadata.
Next Call: Monday, August 18th at 1pm Eastern/10am Pacific.