Decoding the Data Ecosystem Virtual Symposium 2025

virtual-symposium-horizontal.png

August 27th 12:30 p.m. to 5:00 p.m. Eastern Time

 

The Decoding the Data Ecosystem Virtual Symposium invites faculty, students, and researchers to explore how the Common Fund Data Ecosystem (CFDE) can support research, education, and discovery in omics-related research. Over the course of an afternoon, explore emerging tools and resources, hear directly from CFDE experts, and discover innovative projects demonstrating how CFDE data can drive discoveries across disciplines. Whether you’re looking to integrate real-world data into your curriculum or discover new research directions, this symposium offers practical insights and community connections to help you get started. The CFDE includes a wide range of diverse and valuable datasets for omics-related research including genomics, transcriptomics, proteomics, metabolomics, and beyond.

Register now

Additional Information

Schedule and Session Descriptions

START TIME SESSION INFORMATION
12:30 PM ET

Welcome to the Ecosystem: Symposium Kickoff
Jennifer Burnette, CFDE Training Center

Opening Remarks
George Papanicolaou, National Institutes of Health (NIH)

12:45 PM ET

Navigating the Common Fund Data Ecosystem (CFDE)
Noël Burtt, CFDE Knowledge Center

1:15 PM ET

Building Bridges: How the CFDE Training Center Powers Discovery
LaFrancis Gibson, CFDE Training Center

1:45 PM ET

Break

1:55 PM ET

Decoding the Data Ecosystem Collaborative Mentees Showcase
Celebrate the achievements of our mentoring program participants with presentations highlighting their journey, projects, and the impact of mentorship on their professional development.

Multimodal Analysis of DNA repair genes in the placenta
Molly Huang

Study of Brain-Related Traits using predicted brain tissue-specific gene expression in a Mexican American Cohort
Luis Pena Marquez (Pre-Recorded)

Mediation Modeling of Pro-Inflammatory Cytokines, Vaginal Microbiome, and Preterm Birth
Emily Oppold

3:00 PM ET

Empowering the Next Generation: Leveraging Common Fund Data for Student Engagement and Professional Growth
Panel Discussion , with Allissa Dillman, Moderator

3:45 PM ET

Break

3:50 PM ET

Rapid Fire Innovations: Lightning Talks from CFDE Interns & Fellows

Extracting biomarker fields from verbose text
Shubham Agrawal

Estimating the Hidden Proportion of Ligand-Binding
Manjil Man Pradhan

Deep Learning for Link Prediction in Glycan Images
Campbell Ross

Mining Glycan Biomarkers from Publications With AI
Cyrus Chun Hong Au Yeung

Biocurating Machine-Readable Glycan Datasets
Yuxin Zou

4:25 PM ET

Break

4:30 PM ET

Interacting with the Broader NIH Ecosystem
Sahana Kukke, NIH

4:50 PM ET

Closing Remarks
Jennifer Burnette, CFDE Training Center

 

Speakers

jennifer-burnette.webp

Jennifer Burnette, CFDE Training Center - Jennifer Burnette, MPH, CFDE Training Center Director and project manager and director for ORAU expertly steers complex, interdisciplinary programs that bridge local, state, and federal stakeholders. Her career is a testament to her ability to navigate complex networks, having successfully coordinated projects for over 20 federal agency customers including the Centers for Disease Control and Prevention (CDC), Department of Energy (DOE), Department of Homeland Security (DHS), Federal Emergency Management Agency (FEMA), National Institutes of Health (NIH), National Science Foundation (NSF), and 13 DOE and other federal national laboratories.

burtt.webp

Noël Burtt, CFDE Knowledge Center - Noël Burtt is the director of operations and development of the Diabetes Research & Knowledge Portals in the Medical and Population Genetics Program and Metabolism Program at the Broad Institute. Her work centers on the operational and organizational leadership of large-scale, international genetics consortia and public/private partnerships for human genetics studies. She has worked to facilitate data deposition and shape the ethos of the Knowledge Portals for the consortium and the larger community of common disease researchers. She has also managed two public-private partnerships with Novartis and Pfizer as part of the effort to use human genetics to develop better therapies for diabetes.

allissa.webp
Allissa Dillman - Allissa Dillman, PhD, co-PI and Training and Engagement Director for the CFDE Training Center, is the founder and CEO of BioData Sage LLC, a company focused on providing a holistic approach to data science integration in the biomedical and biological science fields. She works with clients in industry, academia, government, and nonprofit sector to create and support training programs on bioinformatics, cloud computing, and the tools and standards for reproducible data science practices for scientific and lay communities. She also creates community events, such as hackathons, where broad communities work towards solving real biomedical data challenges.
gibson.webp

LaFrancis Gibson, CFDE Training Center - LaFrancis Gibson, MBA, MPH, CHES is the Contact PI for the CFDE Training Center and the Manger for Health Promotion at ORAU and has created a unique skill set and understanding of community-based programs and how to manage them to maximize success. With more than 15 years of experience in managing outreach initiatives, training development, and program evaluation, she ensures project success through her expertise in budget control, risk mitigation, and strategic communication with stakeholders from government agencies such as the National Institutes of Health (NIH), National Library of Medicine (NLM) and Centers for Disease Control and Prevention (CDC).

agrawal.jpg

Shubham Agrawal  - Shubham Agrawal is a senior undergraduate Computer Science and Engineering student at IIT Gandhinagar, India. His research interests include Artificial Intelligence, Machine Learning, and Data Science, with particular emphasis on natural language processing, and intelligent agent architectures for automated data analysis. As a 2025 Summer Undergraduate Research Fellow at Caltech, he worked on developing an agentic LLM-based framework for extracting structured biomarker metadata from biomedical publications, in collaboration with the EDRN-CFDE team.

manjil-man-pradhan.webp

Manjil Man Pradhan - Manjil Man Pradhan is a Ph.D. student in Computer Science at the University of New Mexico. His research focuses on the applying artificial intelligence to predict protein-ligand binding sites using computational methods. This summer, he worked as a research intern in the Translation Informatics Division under the guidance and mentorship of Dr. Jeremy Yang and Dr. Praveen Kumar, on the projects in the field of computational biology and drug discovery. He is passionate about developing scalable methods to support biomodelical research and advance our understanding of protein and its functions.

ross.png

Campbell Ross- Originally from Washington, DC, Campbell Ross attended the University of Virginia for both undergrad (B.A. Commerce / Economics, ‘15) and grad school (M.Ed, Exercise Physiology, ‘17). After some years pursuing competitive marathon running, he entered the field of Bioinformatics. His time at Georgetown University’s M.S. in Bioinformatics program culminated with a CFDE / GlyGen internship. His specialization is the application of machine learning frameworks to biological problems. Moving forward he hopes to continue to leverage machine learning to tackle complex biological problems and advance health science.

yeung.jpg

Cyrus Chun Hong Au Yeung - Cyrus Chun Hong Au Yeung is a bioinformatics researcher with a foundation in molecular biochemistry and chemistry. He recently completed his M.S. in Bioinformatics & Molecular Biochemistry at George Washington University, where he developed natural language processing workflows using large language models to extract and standardize glycan biomarker data from biomedical literature. He previously earned a B.Sc. from The University of Hong Kong, with a double major in Biochemistry and Chemistry. His earlier research includes work on aptamer-based diagnostics for infectious diseases and the design of polymer-based drug delivery systems.

yuxin-zou.webp

Yuxin Zou – Yuxin Zou is a rising senior at Cornell University studying Human Biology, Health, and Society. This past summer, she participated in the CFDE-GlyGen internship where she deepened her knowledge in glycobiology and developed an appreciation for the role of structured data in uncovering novel biomedical insights. This experience strengthened my interest in pursuing a career at the intersection of biology, information science, and innovation to improve health systems and patient outcomes. 

huang.webp

Molly Huang - Molly Huang is a Bioinformatics and Systems Biology PhD student at UC San Diego advised by Kathleen Fisch, PhD. She previously earned her BS in Molecular and Cell Biology and MS in Biology from UC San Diego. Her dissertation research focuses on understanding the molecular mechanisms behind placental trophoblast differentiation and function.

luis-pena-marquez.webp

Luis Peña-Marquez - Luis Peña-Marquez is a Ph.D. student and research associate at the University of Texas Rio Grande Valley specializing in bioinformatics, statistical genetics, and computational biology. His work focuses on integrating genomic, transcriptomic, proteomic, and other multi-omic data to uncover genetic and molecular mechanisms underlying complex diseases, with an emphasis on Hispanic populations. He has conducted research on Alzheimer’s disease biomarkers, neurodegenerative disorders, Epstein-Barr virus interactions, and developmental protein networks in the laboratory opossum brain. Luis is experienced in large-scale data analysis, high-performance computing, and multi-omic integration, aiming to advance precision medicine in underrepresented communities.

oppold.webp

Emily Oppold - Emily Oppold is a PhD student in Statistics at Rice University as well as a Graduate Student Trainee at the University of Texas MD Anderson Cancer Center. Her research focuses on developing innovative mediation modeling methodologies to better understand the complex pathways between exposures and outcomes in health and disease. With a passion for interdisciplinary collaboration, Emily aims to make statistical methods more accessible and impactful across fields such as psychology, genomics, and public health. She also serves as the Graduate Student Association President at Rice, where she advocates for student engagement, equity, and research support.

kukke.webp

Sahana Kukke, NIH - Sahana Kukke, PhD, is a Program Leader in the Catalytic Data Resources Team at the NIH Office of Strategic Coordination – The Common Fund. She leads initiatives in multiple Common Fund programs, including Precision Medicine with AI: Integrating Imaging with Multimodal Data (PRIMED-AI), Replication to Enhance Research Impact Initiative (Replication Initiative), the Common Fund Data Ecosystem (CFDE), and the Stimulating Peripheral Activity to Relieve Conditions (SPARC) programs. In support of CFDE, Dr. Kukke is the Program Official for the Training Center.

About the Common Fund Data Ecosystem

The NIH Common Fund programs generate datasets from a variety of projects ranging from genomics to phenotypes. The Common Fund Data Ecosystem (CFDE) aims to facilitate improved discovery, reuse, integration, and analyses of these datasets to form novel hypotheses for accelerating discoveries in biomedical research. The CFDE is organized around five Centers that integrate data, resources, and knowledge from many Common Fund Programs. The resources created by the Centers empower the research community to use Common Fund data sets for novel scientific research that was impossible before.

About the CFDE Training Center

In coordination with other CFDE Centers and funded programs and projects, the Training Center (TC) acts as a central hub to provide a comprehensive approach to support current and potential CFDE users on their learning journey. It aims to expand the CFDE data userbase and enhance the confidence and complexity of dataset usage through community building and engagement activities.

The TC will provide training in basic and advanced computational and data analytic skills for data science learners and users to engage meaningfully with CFDE data and tools in research and increase awareness and attract new users from the bioinformatics, data science, and research communities through a variety of initiatives internal and external to the CFDE, including attendance at conferences and other activities and opportunities.

 

 

The Common Fund Data Ecosystem (CFDE) Training Center (TC) is supported in whole by the National Institutes of Health (NIH) Common Fund under award 1OT2OD037922-01. The contents of the training activities are solely the responsibility of the CFDE TC and do not necessarily represent the official views of the National Institutes of Health.