← Back to Final Project
🧬

DepMap Cancer Dependency Analysis

Integrate gene expression and dependency data to investigate PKMYT1, a cell-cycle kinase implicated as a potential cancer target

🎯

Project Overview

In this capstone project, you'll work with real cancer genomics data from theDepMap projectto understand which cancers depend on PKMYT1 for survival, characterize its expression patterns, and discover co-dependent genes through correlation analysis.

What You'll Learn

  • Conduct exploratory data analysis on complex genomic datasets
  • Analyze relationships between gene expression and functional dependencies
  • Implement correlation analyses across multiple data types
  • Integrate multi-omic data to identify therapeutic targets
  • Create publication-quality visualizations
🚀

Getting Started

This project requires two large datasets from the DepMap project. Follow these steps to set up your environment before starting the analysis.

1

Download and Cache Data

Run the data import notebook to download and cache the CRISPR dependency data (~350 MB) and gene expression data (~450 MB) to your Google Drive. This needs to be done only once.

📥Open Data Import Notebook

Note: This notebook will mount your Google Drive and download two CSV files (~800 MB total). The process takes 4-6 minutes but only needs to be run once. Future analyses will load from cache in 10-20 seconds.

2

Start Your Analysis

Once your data is cached, open the main analysis notebook to begin your investigation of PKMYT1 dependencies across cancer types.

🔬Open Analysis Notebook
📊

Project Structure

The project is divided into four parts, worth a total of 100 marks. Each part builds on the previous one to create a comprehensive analysis pipeline.

Part 1: Exploratory Data Analysis

20 marks

Load and explore the datasets, perform quality control, and characterize PKMYT1 expression and dependency across different cancer types.

  • • Data loading and initial inspection
  • • Quality control and missing value analysis
  • • PKMYT1 dependency profiling by cancer type
  • • Visualization of key patterns

Part 2: Correlation Analysis

35 marks

Perform genome-wide correlation analysis to identify genes that correlate with PKMYT1 dependency and expression. Apply multiple testing correction using FDR.

  • • Genome-wide correlation with PKMYT1 dependency
  • • Expression correlation analysis
  • • Multiple testing correction (FDR)
  • • Volcano plots and statistical visualization

Part 3: Pathway Enrichment

35 marks

Use GSEApy to perform pathway enrichment analysis on your correlated gene sets. Integrate findings across different analyses to identify biological themes.

  • • Gene set enrichment analysis using GSEApy
  • • Pathway identification for co-dependent genes
  • • Integration of expression and dependency results
  • • Biological interpretation of enriched pathways

Part 4: Summary Figure

10 marks

Create a comprehensive multi-panel publication-quality figure that synthesizes all your results into a cohesive visual story.

  • • Multi-panel figure design
  • • Publication-quality visualization
  • • Clear labeling and legends
  • • Effective communication of key findings

Assessment Criteria

Your work will be evaluated on both technical execution and scientific communication.

Code Quality

  • Clear, descriptive variable names following PEP 8
  • Minimal code duplication using helper functions
  • Error-free execution with reproducible outputs
  • Efficient data processing and analysis

Scientific Communication

  • Comprehensive documentation of analytical reasoning
  • Clear interpretation of biological significance
  • Well-labeled, publication-ready visualizations
  • Thoughtful integration of multi-omic findings

Expected Outputs

Data Files

  • • Top 100 dependency-correlated genes (CSV)
  • • Top 100 expression-correlated genes (CSV)
  • • Integrated results table (CSV)

Visualizations

  • • Volcano plots for correlation analysis
  • • Heatmaps of gene dependencies
  • • Multi-panel summary figure
📚

Resources & Support

Required Python Packages

Core Analysis

  • • pandas - Data manipulation
  • • numpy - Numerical computing
  • • scipy - Statistical analysis

Visualization & Enrichment

  • • matplotlib / seaborn - Plotting
  • • gseapy - Pathway enrichment

Useful Links

Ready to Begin?

Start by downloading the data, then dive into the analysis notebook to begin your journey into cancer genomics.