DepMap Cancer Dependency Analysis
Integrate gene expression and dependency data to investigate PKMYT1, a cell-cycle kinase implicated as a potential cancer target
Project Overview
In this capstone project, you'll work with real cancer genomics data from theDepMap projectto understand which cancers depend on PKMYT1 for survival, characterize its expression patterns, and discover co-dependent genes through correlation analysis.
What You'll Learn
- ★Conduct exploratory data analysis on complex genomic datasets
- ★Analyze relationships between gene expression and functional dependencies
- ★Implement correlation analyses across multiple data types
- ★Integrate multi-omic data to identify therapeutic targets
- ★Create publication-quality visualizations
Getting Started
This project requires two large datasets from the DepMap project. Follow these steps to set up your environment before starting the analysis.
Download and Cache Data
Run the data import notebook to download and cache the CRISPR dependency data (~350 MB) and gene expression data (~450 MB) to your Google Drive. This needs to be done only once.
📥Open Data Import NotebookNote: This notebook will mount your Google Drive and download two CSV files (~800 MB total). The process takes 4-6 minutes but only needs to be run once. Future analyses will load from cache in 10-20 seconds.
Start Your Analysis
Once your data is cached, open the main analysis notebook to begin your investigation of PKMYT1 dependencies across cancer types.
🔬Open Analysis NotebookProject Structure
The project is divided into four parts, worth a total of 100 marks. Each part builds on the previous one to create a comprehensive analysis pipeline.
Part 1: Exploratory Data Analysis
20 marksLoad and explore the datasets, perform quality control, and characterize PKMYT1 expression and dependency across different cancer types.
- • Data loading and initial inspection
- • Quality control and missing value analysis
- • PKMYT1 dependency profiling by cancer type
- • Visualization of key patterns
Part 2: Correlation Analysis
35 marksPerform genome-wide correlation analysis to identify genes that correlate with PKMYT1 dependency and expression. Apply multiple testing correction using FDR.
- • Genome-wide correlation with PKMYT1 dependency
- • Expression correlation analysis
- • Multiple testing correction (FDR)
- • Volcano plots and statistical visualization
Part 3: Pathway Enrichment
35 marksUse GSEApy to perform pathway enrichment analysis on your correlated gene sets. Integrate findings across different analyses to identify biological themes.
- • Gene set enrichment analysis using GSEApy
- • Pathway identification for co-dependent genes
- • Integration of expression and dependency results
- • Biological interpretation of enriched pathways
Part 4: Summary Figure
10 marksCreate a comprehensive multi-panel publication-quality figure that synthesizes all your results into a cohesive visual story.
- • Multi-panel figure design
- • Publication-quality visualization
- • Clear labeling and legends
- • Effective communication of key findings
Assessment Criteria
Your work will be evaluated on both technical execution and scientific communication.
Code Quality
- ★Clear, descriptive variable names following PEP 8
- ★Minimal code duplication using helper functions
- ★Error-free execution with reproducible outputs
- ★Efficient data processing and analysis
Scientific Communication
- ★Comprehensive documentation of analytical reasoning
- ★Clear interpretation of biological significance
- ★Well-labeled, publication-ready visualizations
- ★Thoughtful integration of multi-omic findings
Expected Outputs
Data Files
- • Top 100 dependency-correlated genes (CSV)
- • Top 100 expression-correlated genes (CSV)
- • Integrated results table (CSV)
Visualizations
- • Volcano plots for correlation analysis
- • Heatmaps of gene dependencies
- • Multi-panel summary figure
Resources & Support
Required Python Packages
Core Analysis
- • pandas - Data manipulation
- • numpy - Numerical computing
- • scipy - Statistical analysis
Visualization & Enrichment
- • matplotlib / seaborn - Plotting
- • gseapy - Pathway enrichment
Useful Links
- DepMap Project Homepage - Learn more about the Cancer Dependency Map
- GSEApy Documentation - Guide to pathway enrichment analysis
- Pandas Documentation - Reference for data manipulation
Ready to Begin?
Start by downloading the data, then dive into the analysis notebook to begin your journey into cancer genomics.