Sample Datasets

Real biological datasets for practicing data analysis and building your Python skills. All datasets are research-grade and used in published studies.

📊 About the Cancer Dependency Map (DepMap)

The Cancer Dependency Map is a genome-wide CRISPR screening dataset that measures how dependent cancer cell lines are on each gene for survival. High dependency scores indicate genes that are essential for cancer cell survival.

🔬

CRISPR Screening

Systematic gene knockouts

🧪

Cancer Cell Lines

Diverse cancer types

📈

Dependency Scores

Gene essentiality data

Learn more about DepMap↗

📁 Available Datasets

🧬

Complete DepMap Dataset

24Q2Advanced

Complete gene dependency dataset from the Cancer Dependency Map project

Dataset Features:

•~18,000 genes across cancer cell lines
•CRISPR knockout dependency scores
•Cell line metadata including cancer type
•Comprehensive dataset for advanced analysis

Used In:

Lecture 3

File Size:

~45 MB

Download CSV (~45 MB)

🎯

Filtered DepMap Dataset

24Q2Beginner-Friendly

Curated subset focusing on breast and myeloid cancers

Dataset Features:

•Breast and myeloid cancer cell lines only
•Same gene dependency data structure
•Smaller dataset perfect for learning
•Ideal for classroom exercises

Used In:

Lecture 3

File Size:

~12 MB

Download CSV (~12 MB)

📋 Dataset Structure

Columns Include:

•Cell Line ID: Unique identifier for each cancer cell line
•Cancer Type: Primary cancer classification
•Gene Symbols: Standard gene nomenclature
•Dependency Scores: CRISPR knockout effects

Analysis Ideas:

•Find genes essential across all cancer types
•Identify cancer-specific dependencies
•Compare dependency patterns between cancers
•Discover potential drug targets

💡 How to Use These Datasets

1️⃣

Download

Click download to save the CSV file to your computer

2️⃣

Load in Python

Use pandas.read_csv() to import the data into your notebook

3️⃣

Explore & Analyze

Follow along with lecture notebooks to discover biological insights

📝 Quick Start Code:

import pandas as pd
df = pd.read_csv('combined_model_crispr_data_filtered.csv')
df.head()