โ† Back to Home

Sample Datasets

Real biological datasets for practicing data analysis and building your Python skills. All datasets are research-grade and used in published studies.

๐Ÿ“Š About the Cancer Dependency Map (DepMap)

The Cancer Dependency Map is a genome-wide CRISPR screening dataset that measures how dependent cancer cell lines are on each gene for survival. High dependency scores indicate genes that are essential for cancer cell survival.

๐Ÿ”ฌ
CRISPR Screening
Systematic gene knockouts
๐Ÿงช
Cancer Cell Lines
Diverse cancer types
๐Ÿ“ˆ
Dependency Scores
Gene essentiality data

๐Ÿ“ Available Datasets

๐Ÿงฌ

Complete DepMap Dataset

24Q2Advanced

Complete gene dependency dataset from the Cancer Dependency Map project

Dataset Features:

  • โ€ข~18,000 genes across cancer cell lines
  • โ€ขCRISPR knockout dependency scores
  • โ€ขCell line metadata including cancer type
  • โ€ขComprehensive dataset for advanced analysis
Used In:
Lecture 3
File Size:
~45 MB
Download CSV (~45 MB)
๐ŸŽฏ

Filtered DepMap Dataset

24Q2Beginner-Friendly

Curated subset focusing on breast and myeloid cancers

Dataset Features:

  • โ€ขBreast and myeloid cancer cell lines only
  • โ€ขSame gene dependency data structure
  • โ€ขSmaller dataset perfect for learning
  • โ€ขIdeal for classroom exercises
Used In:
Lecture 3
File Size:
~12 MB
Download CSV (~12 MB)

๐Ÿ“‹ Dataset Structure

Columns Include:

  • โ€ขCell Line ID: Unique identifier for each cancer cell line
  • โ€ขCancer Type: Primary cancer classification
  • โ€ขGene Symbols: Standard gene nomenclature
  • โ€ขDependency Scores: CRISPR knockout effects

Analysis Ideas:

  • โ€ขFind genes essential across all cancer types
  • โ€ขIdentify cancer-specific dependencies
  • โ€ขCompare dependency patterns between cancers
  • โ€ขDiscover potential drug targets

๐Ÿ’ก How to Use These Datasets

1๏ธโƒฃ

Download

Click download to save the CSV file to your computer

2๏ธโƒฃ

Load in Python

Use pandas.read_csv() to import the data into your notebook

3๏ธโƒฃ

Explore & Analyze

Follow along with lecture notebooks to discover biological insights

๐Ÿ“ Quick Start Code:

import pandas as pd
df = pd.read_csv('combined_model_crispr_data_filtered.csv')
df.head()