DNA Analysis with Python
for Biologists

Build an Open Reading Frame finder to translate DNA sequences into proteins

๐ŸงฌToday's Focus

โœ“

String Manipulation

Learn to work with text data, perfect for DNA and protein sequences

โœ“

DNA Sequence Analysis

Apply Python skills to real biological problems like GC content calculation

โœ“

Pattern Recognition

Find motifs, restriction sites, and other important sequence features

๐ŸCore Python Skills

โ†’

Python dictionaries

Store and manipulate biological data efficiently

โ†’

String Methods

Essential tools for sequence processing and analysis

โ†’

Conditional Logic

Make decisions in your code based on biological criteria

๐Ÿ”งAdditional Topics

+

Python Development Tools

IDE setup, debugging, and best practices for scientific computing

+

Open Source Software

Understanding the ecosystem of biological analysis tools

+

Biopython

Introduction to the most popular Python library for bioinformatics

Today's Goal: Build an ORF Finder ๐Ÿงฌ

Our Python Program Will:

1.

Find Start Codons

Locate all ATG positions

2.

Find Stop Codons

Scan for TAA, TAG, TGA in-frame

3.

Extract ORFs

Get sequences between start & stop

4.

Find Longest ORF

Identify the most likely protein

5.

Translate to Protein

Convert DNA codons to amino acids

6.

Process Many Files

Automate for 100s of sequences

Skills we'll learn: String operations โ€ข Dictionaries โ€ข Conditionals โ€ข File I/O โ€ข Biopython

Meet Darren: A Biologist with a Problem ๐Ÿงฌ

Darren the biologist

Darren, PhD student studying gene expression

The Challenge

Darren has sequenced hundreds of mRNA molecules from cancer cells. Each sequence could encode important proteins, but finding them manually takes hours per sequence!

๐Ÿ“ Current situation:

  • โ€ข 500+ mRNA sequence files
  • โ€ข Each needs to be checked for ORFs
  • โ€ข Manual checking takes ~30 min/file
  • โ€ข That's 250 hours of tedious work!

๐Ÿ’ก Solution: Automate with Python!

What takes 30 minutes by hand can be done in milliseconds with code

What are Open Reading Frames (ORFs)?

Three reading frames showing ORF concept

Key Concepts

  • โ€ขDNA/RNA can be read in 3 different frames
  • โ€ขEach frame groups nucleotides into different codons
  • โ€ขAn Open frame has no early stop codons
  • โ€ขA Blocked frame hits a stop codon quickly

ORF Requirements

Start:

ATG (codes for Methionine)

Stop:

TAA, TAG, or TGA

Valid ORF:

ATG โ†’ ... โ†’ Stop (in same frame!)

In the example above: Only Frame 1 is "open" - it can produce a full protein. Frames 2 & 3 hit stop codons immediately!

Breaking Down the ORF Problem

To find and translate an Open Reading Frame, we need to solve 3 simple steps:

1

Find First ATG

Scan the DNA string and find the first ATG start codon

CGTAACATGCGTAAATAG
โ†‘ position 6
2

Extract ORF

From ATG, collect codons until we hit a STOP codon

ATG-CGT-AAA-TAG
ORF: ATGCGTAAATAG
3

Translate to Protein

Convert each codon to its amino acid using the genetic code

M - R - K - *
Protein sequence
Simple and straightforward! ๐Ÿงฌ โ†’ ๐Ÿ” โ†’ ๐Ÿงช

๐Ÿ’ก Our Learning Path

String Slicing & Conditionals: Find ATG in DNA sequence
Loops & Logic: Extract ORF from start to stop codon
Dictionaries: Use codon table to translate to protein
Debugging: Fix errors and test our ORF finder

The Complete Code - Live Demo!

Here's our complete ORF finder - try it with the example DNA sequence!

Click "Run โ–ถ" to see it find the protein sequence from DNA
Loading interactive Python...
String Slicing
Extract codons
Conditionals
if/else logic
Dictionaries
Codon mapping
Functions
Modular code

Part 1

Python String Fundamentals
for Biology

Working with DNA sequences as strings

๐ŸŽฏ Our First Function: Finding the Start Codon

This is what we'll build together in the next few slides:

def find_atg(dna_sequence):
"""Find the first ATG start codon in the sequence."""
for i in range(len(dna_sequence) - 2):
if dna_sequence[i:i+3] == 'ATG':
return i
return None # Return None if no ATG found

Don't worry if this looks complex - we'll build it step by step!

Quick Review: Data Types & Strings

Basic Data Types

intโ†’ 42, -17, 1000

Whole numbers

floatโ†’ 3.14, -0.5, 2.7e-8

Decimal numbers

strโ†’ "ATCG", 'DNA'

Text sequences

boolโ†’ True, False

Logical values

String Operations We Learned

dna = "ATCGATCG"

len(dna)โ†’ 8
dna[0]โ†’ "A"
dna[0:3]โ†’ "ATC"
dna + "TAA"โ†’ "ATCGATCGTAA"
dna * 2โ†’ "ATCGATCGATCGATCG"
"AT" in dnaโ†’ True

Remember: In Python, strings are sequences of characters - perfect for representing DNA, RNA, and protein sequences!

DNA String Slicing

๐Ÿ”ช Understanding String Slicing

String slicing lets you extract parts of a string using [start:end] notation:

  • โ€ข string[1:4] gets characters at positions 1, 2, and 3
  • โ€ข Position counting starts from 0
  • โ€ข The end position is not included
Loading interactive Python...

๐ŸŽฏ Try It Yourself!

Complete the challenge: Print the last three bases of the DNA sequence using slicing.

Loading interactive Python...

๐ŸŽฏ Practice Challenge

Try these basic slicing exercises:

  • โ€ข Extract just the middle 4 bases of "ATGCGTAAA"
  • โ€ข Get the first half of the DNA sequence
  • โ€ข Extract every other base using step slicing [::2]
  • โ€ข Practice using negative indices to get sections from the end

๐Ÿš€ Practice More in Google Colab!

Open the full string manipulation notebook with exercises and solutions

Open in Colab

Finding ATGs: Loops + Slicing

๐Ÿ”„ Step 1: Loop Through Every Position

To find ATGs, we need to check every possible position in the DNA string:

  • โ€ข Position 0: Check bases 0-1-2
  • โ€ข Position 1: Check bases 1-2-3
  • โ€ข Position 2: Check bases 2-3-4
  • โ€ข And so on...
Loading interactive Python...

๐Ÿ”ช Step 2: Extract 3 Bases from Each Position

At each position, slice out exactly 3 bases to check if it could be a start codon:

Loading interactive Python...

๐Ÿšง We've Hit a Problem!

What we can do: Extract 3-base sequences from every position โœ…

What we can't do yet: Check IF a sequence equals "ATG" โŒ

We need: A way to make decisions in our code - Python conditionals!

๐Ÿ“š Coming Up Next: Python Conditionals

To solve our ATG-finding problem, we'll learn:

  • โ€ข if statements for making decisions
  • โ€ข Comparing strings with ==
  • โ€ข elif and else for multiple conditions
  • โ€ข Putting it all together to find ATGs automatically!

๐Ÿš€ Practice String Manipulation in Google Colab!

Try the full string manipulation notebook with more loop and slicing exercises

Open in Colab

Part 2

Python Conditionals
Making Decisions

Teaching Python to make choices based on biological data

๐Ÿ” Focus: The Conditional Logic

Notice the if statement that makes the decision:

def find_atg(dna_sequence):
"""Find the first ATG start codon in the sequence."""
for i in range(len(dna_sequence) - 2):
if dna_sequence[i:i+3] == 'ATG': # โ† The key decision!
return i
return None # Return None if no ATG found

We'll learn how if statements help us find biological patterns!

Conditionals: Making Decisions in Code

๐ŸŽฏ Basic if Statement

Use if to make decisions based on conditions:

Loading interactive Python...

๐Ÿ”„ if-else: Choose Between Two Options

Use else to handle the opposite case:

Loading interactive Python...

๐ŸŽช elif: Multiple Choices

Use elif to test multiple conditions:

Loading interactive Python...

๐Ÿ”‘ Key Points to Remember

Syntax: Always use : after conditions
Indentation: Python uses spaces/tabs to group code
Comparison: Use == for equality, != for not equal
Flow: Only one block executes per if-elif-else chain

๐Ÿš€ Practice Conditionals in Google Colab!

Open the comprehensive conditionals notebook with exercises and biological examples

Open in Colab

Building Our First Function: find_atg()

๐ŸŽฏ Goal: Find the First ATG in a DNA Sequence

Our function needs to:

  • โ€ข Look at every position in the DNA string
  • โ€ข Check if the 3 bases starting at that position are "ATG"
  • โ€ข Return the position when ATG is found
  • โ€ข Return None if no ATG exists

๐Ÿ“ Step 1: Loop Through Each Position

We use range(len(dna_sequence) - 2) to avoid going past the end:

Loading interactive Python...

๐Ÿ” Step 2: Check if Three Bases Equal "ATG"

At each position, we extract 3 bases and compare with "ATG":

Loading interactive Python...

โ†ฉ๏ธ Step 3: Return the Position When Found

As soon as we find ATG, we return its position and stop searching:

Loading interactive Python...

๐Ÿงฉ Function Components Breakdown

def find_atg(dna_sequence):

Define function with one parameter

for i in range(len(dna_sequence) - 2):

Loop through valid positions

if dna_sequence[i:i+3] == 'ATG':

Check if 3 bases equal ATG

return i

Return position and exit

๐Ÿ’ก Try It Yourself!

Modify the function to find ALL ATG positions (not just the first):

Loading interactive Python...

๐Ÿ”‘ Key Concepts We Combined

For Loops
Visit each position
String Slicing
Extract 3 bases
Conditionals
Check if equals ATG

Part 3

Python Dictionaries
Match Codons with Amino Acids

Using key-value pairs to store and look up biological information

๐Ÿ—‚๏ธ Next Function: The Codon Reader

Notice the dictionary lookup that finds stop codons:

# Step 2: Extract ORF from ATG to STOP codon
def find_orf(dna_sequence, atg_index):
"""Find the ORF starting from ATG position until stop codon."""
orf = ''
for i in range(atg_index, len(dna_sequence) - 2):
codon = dna_sequence[i:i+3]
if len(codon) == 3: # Make sure we have a complete codon
orf += codon
if codon in CODON_TABLE and CODON_TABLE[codon] == '*': # โ† Dictionary magic!
break
return orf

We'll learn how CODON_TABLE[codon] looks up amino acids instantly!

Dictionaries: Key-Value Pairs for Biology

๐Ÿ—‚๏ธ Creating a Simple Codon Dictionary

Dictionaries map keys to values. Perfect for codon โ†’ amino acid!

Loading interactive Python...

๐Ÿ” Accessing Keys and Values

Get values by key and loop through the dictionary:

Loading interactive Python...

โœ๏ธ Adding and Changing Values

Modify existing entries or add new ones:

Loading interactive Python...

๐Ÿ›ก๏ธ Safe Operations: get() and pop()

Handle missing keys safely and remove entries:

Loading interactive Python...

The Two Core Functions - Interactive Demo

๐Ÿ“š Dictionary & Functions Setup

Loading interactive Python...

๐Ÿงฌ Try It Out!

Loading interactive Python...

Debugging & Error Handling

Let's examine our code with line numbers to understand debugging

Line numbers help us identify exactly where errors occur
Loading interactive Python...
Error Messages
Read Python errors
Debugging
Find and fix issues
Edge Cases
Handle special inputs
Handling Expected Errors
Graceful error handling

Understanding Python Error Messages

๐Ÿ” Anatomy of a Python Error

Traceback (most recent call last):
File "script.py", line 42, in <module>
result = find_atg(dna_sequence)
NameError: name 'find_atg' is not defined
๐Ÿ“ Location
File name and line number
๐Ÿ”ค Code Line
The actual problematic code
โŒ Error Type
What went wrong + explanation

๐Ÿ› Example 1: Syntax Error

Missing parentheses - Python can't understand the code structure

Loading interactive Python...

๐Ÿงฎ Example 2: Type Error

Trying to use incompatible data types together

Loading interactive Python...

๐Ÿท๏ธ Example 3: Name Error

Using a variable or function that doesn't exist

Loading interactive Python...

๐Ÿ’ก Debugging Tips

๐Ÿ“– Read the Error Carefully
  • โ€ข Start from the bottom - that's the actual error
  • โ€ข Note the line number and file name
  • โ€ข Look at the exact code line mentioned
๐Ÿ” Common Error Patterns
  • โ€ข SyntaxError: Check parentheses, quotes, colons
  • โ€ข TypeError: Check if data types match
  • โ€ข NameError: Check spelling and definitions

Defensive Programming for Biological Data

๐Ÿ›ก๏ธ Expect the Unexpected

Real biological data is messy:

  • โ€ข DNA sequences might not contain ATG start codons
  • โ€ข FASTA files may have ambiguous bases (N, R, Y)
  • โ€ข User input could be empty or invalid
  • โ€ข Sequences might be too short for analysis

Solution: Use if statements to validate data and handle expected scenarios gracefully!

๐Ÿ” Simple ATG Finder with Data Validation

Two key checks: valid DNA bases and ATG presence

Loading interactive Python...

๐Ÿ’ก Defensive Programming Principles

โœ… Always Validate Input
  • โ€ข Check for empty or None values
  • โ€ข Validate data types (string vs number)
  • โ€ข Verify biological constraints
๐Ÿ”ง Handle Expected Failures
  • โ€ข No ATG found โ†’ return None
  • โ€ข Invalid bases โ†’ clean or warn
  • โ€ข Short sequences โ†’ inform user
๐Ÿ“ Provide Clear Feedback
  • โ€ข Print warnings for data issues
  • โ€ข Return meaningful values
  • โ€ข Document what went wrong
๐Ÿ›ก๏ธ Fail Gracefully
  • โ€ข Return None instead of crashing
  • โ€ข Continue processing when possible
  • โ€ข Don't let one bad sequence stop analysis

๐Ÿ”„ Alternative: try/except blocks

try/except is useful for building robust software applications:

๐Ÿ—๏ธ Software Development
  • โ€ข File I/O operations
  • โ€ข Network connections
  • โ€ข Database queries
  • โ€ข User interface errors
๐Ÿงช Data Science
  • โ€ข Use simple if statements
  • โ€ข Validate data explicitly
  • โ€ข Handle expected scenarios
  • โ€ข Focus on data quality

๐Ÿ’ก For data science: Missing ATGs, invalid bases, or empty sequences aren't "exceptions" - they're normal biological scenarios that need explicit handling with if statements.

DNA Sequence File Formats: FASTA

๐Ÿ“„ What is a FASTA File?

FASTA is the most common format for storing DNA, RNA, and protein sequences. It's a simple text format that biologists use worldwide.

๐Ÿ’ก Fun Fact

FASTA was named after the FASTA software program for sequence alignment, developed in the 1980s at the University of Virginia

๐Ÿ—๏ธ FASTA Format Structure

Basic Format Rules

  • โ€ข Header line starts with >
  • โ€ข Sequence ID comes right after >
  • โ€ข Description (optional) after the ID
  • โ€ข Sequence data on following lines
  • โ€ข No line length limit for sequence

Example FASTA File

>NM_000546.6 Homo sapiens tumor protein p53
ATGGAGGAGCCGCAGTCAGATCCTAGCGTCGAGCCCCCTCTGAGTCAGGAAACA
TTTTCAGACCTATGGAAACTACTTCCTGAAAACAACGTTCTGTCCCCCTTGCCG
TCCCAAGCAATGGATGATTTGATGCTGTCCCCGGACGATATTGAACAATGGTTC
ACTGAAGACCCAGGTCCAGATGAAGCTCCCAGAATGCCAGAGGCTGCTCCCCG
>NM_001126115.2 Homo sapiens BRCA1 gene
ATGGATTTCCGTCTGAACAAACAACACCGCCGGCCCCGTGGGTCCGTGTCCCCG
GCAAGCCCCACCCGGGCCCTCCCTCCCGGCTGGGGGCCGCCCCCCGACACCAAT
CAGGCCCCCCACCCCGGCTCTCTACCCCCGCGCCCCCGGACACTACCCCCCGCC

๐ŸŒ Where to Find FASTA Files

NCBI GenBank

National Center for Biotechnology Information

  • โ€ข Comprehensive gene database
  • โ€ข Download individual genes
  • โ€ข Genome assemblies available

Ensembl

European genome annotation database

  • โ€ข High-quality annotations
  • โ€ข Multiple species genomes
  • โ€ข Easy bulk downloads

UniProt

Protein sequence database

  • โ€ข Protein sequences only
  • โ€ข Functional annotations
  • โ€ข Research-quality curation

๐Ÿ’ก Key Takeaways: Working with FASTA Files

FASTA Essentials
  • โ€ข Simple, universal sequence format
  • โ€ข Header starts with >
  • โ€ข Can contain multiple sequences
  • โ€ข Used by all major databases
Python Skills
  • โ€ข File reading with open()
  • โ€ข String manipulation for parsing
  • โ€ข Dictionary storage for multiple sequences
  • โ€ข Always handle the last sequence!

๐ŸŽฏ FASTA files are your gateway to analyzing real biological sequences!

๐Ÿ’ป FASTA File Parsing: Professional Approach

Here's how bioinformaticians parse FASTA files in the real world - handling multiple sequences and complex structures.

๐Ÿงฌ Complete FASTA Parser

sequences = {}
current_gene_id = None
current_sequence = ""
# Use context manager to safely open and read the file
with open(filename, 'r') as file:
for line in file:
line = line.strip() # Remove whitespace
if line.startswith('>'):
# Save previous sequence if we have one
if current_gene_id is not None:
sequences[current_gene_id]['sequence'] = current_sequence
# Parse new header
header_parts = line[1:].split(' ', 1) # Split on first space only
current_gene_id = header_parts[0]
# Store header info
sequences[current_gene_id] = {
'header': line[1:], # Full header without >
'sequence': ""
}
current_sequence = "" # Reset sequence
print(f"Found sequence: {current_gene_id}")
else:
# Add to current sequence (sequences can span multiple lines)
current_sequence += line.upper()
# Don't forget the last sequence!
if current_gene_id is not None:
sequences[current_gene_id]['sequence'] = current_sequence
# Display results
for gene_id, data in sequences.items():
print(f"\nGene: {gene_id}")
print(f"Header: {data['header']}")
print(f"Length: {len(data['sequence'])} bases")
print(f"First 50 bases: {data['sequence'][:50]}...")

๐Ÿ” Why This Approach Works

๐Ÿงฉ Handles Multiple Sequences

Real FASTA files often contain multiple sequences. This parser stores each one with its own ID and metadata.

๐Ÿ“ Line-by-Line Processing

Sequences can span multiple lines. This approach reads line by line and concatenates sequence data properly.

๐Ÿ’พ Smart Data Structure

Uses nested dictionaries to store both header information and sequence data for easy access.

โš ๏ธ Edge Case Handling

Don't forget the last sequence! The final sequence needs special handling since there's no next header.

๐Ÿ’ก Key Programming Concepts
  • โ€ข Context Managers - with open() safely handles files
  • โ€ข String Methods - .strip(), .startswith(), .split()
  • โ€ข State Management - Tracking current sequence and ID
  • โ€ข Nested Dictionaries - Complex data organization
  • โ€ข Edge Cases - Handling the last sequence properly
  • โ€ข Data Validation - Checking for None values

๐ŸŽ“ Ready for More Advanced Practice?

Build complete FASTA parsers and work with real research datasets

Advanced FASTA Analysis

File I/O & the with Statement

๐Ÿ“ Why File Handling Matters in Biology

Biological data lives in files - sequences, experiment results, annotations. Learning proper file handling is essential for any bioinformatics work.

๐Ÿงฌ Real Examples

FASTA sequences, CSV experiment data, JSON annotations, XML databases, TSV gene expression data, and many more!

โš ๏ธ The Problem: Files Can Get "Stuck Open"

โŒ The Old Way (Risky)

# Opening a file the old way
file = open('sequences.fasta', 'r')
content = file.read()
# Process the content...
# What if an error happens here?
# The file might never get closed!
# This can cause problems...
file.close() # Might never execute!

๐Ÿšจ What Can Go Wrong

  • โ€ข Memory leaks - Files stay open in memory
  • โ€ข File locks - Other programs can't access the file
  • โ€ข Resource exhaustion - System runs out of file handles
  • โ€ข Data corruption - Writes might not be saved
  • โ€ข Crashes - Program errors leave files open

โœ… The Solution: Context Managers & with Statement

๐Ÿ’š The Safe Way

# Using the with statement
with open('sequences.fasta', 'r') as file:
content = file.read()
# Process the content...
# Even if an error happens here,
# the file will ALWAYS be closed!
# File is automatically closed here
# No matter what happened above!

๐ŸŽฏ Why It's Better

  • โ€ข Automatic cleanup - Files always close
  • โ€ข Exception safe - Works even if errors occur
  • โ€ข Cleaner code - No need to remember .close()
  • โ€ข Best practice - Used by all professional developers
  • โ€ข Resource efficient - Prevents memory leaks

๐Ÿ’ก Key Takeaways: File I/O Best Practices

Essential Rules
  • โ€ข Always use with open()
  • โ€ข Choose the right file mode for your task
  • โ€ข Handle large files line by line
  • โ€ข Check if files exist before reading
Bioinformatics Tips
  • โ€ข Use .strip() to remove whitespace
  • โ€ข Process files line by line for memory efficiency
  • โ€ข Validate file formats before processing
  • โ€ข Always backup important data files

๐ŸŽฏ Proper file handling prevents data loss and makes your code more reliable!

Lecture 2 Summary: What You've Learned Today

1๏ธโƒฃString Operations & DNA Analysis

String Slicing

  • โ–ธExtract sequence parts: dna[0:3]
  • โ–ธFind reading frames and ORFs

String Methods

  • โ–ธ.find(), .upper(), .replace()
  • โ–ธSearch for start/stop codons

Biological Context

  • โ–ธReading frames and translation
  • โ–ธOpen Reading Frame analysis

2๏ธโƒฃConditionals & Decision Making

If Statements

  • โ–ธif condition:
  • โ–ธMake decisions in code

Logical Operators

  • โ–ธand, or, not
  • โ–ธComplex condition testing

Error Handling

  • โ–ธValidate input sequences
  • โ–ธDefensive programming

3๏ธโƒฃDictionaries & Data Organization

Key-Value Pairs

  • โ–ธ{'codon': 'amino_acid'}
  • โ–ธStore genetic code tables

Translation

  • โ–ธDNA โ†’ RNA โ†’ Protein
  • โ–ธCodon table lookups

File Handling

  • โ–ธFASTA file parsing
  • โ–ธwith open() best practices

๐ŸงฌReal Bioinformatics Applications

๐Ÿ”

ORF Finding

Identify potential protein-coding regions in DNA sequences

๐Ÿ”„

Sequence Translation

Convert DNA to protein sequences using codon tables

๐Ÿ“„

File Processing

Parse and analyze biological data formats like FASTA

๐Ÿ’กAdvanced Programming Skills Gained

  • โœ“String manipulation for sequence analysis
  • โœ“Data validation with conditionals
  • โœ“Efficient data lookup using dictionaries
  • โœ“File I/O operations for real data
  • โœ“Error handling and defensive coding
  • โœ“Biological data processing workflows

๐Ÿš€What's Coming Next

Lecture 3

Data Analysis with Pandas: Tables, Gene Dependencies & Correlation
๐Ÿ“Š
Pandas DataFrames

Work with tabular biological data, CSV files, and gene expression datasets

๐Ÿ”—
Gene Dependencies

Analyze relationships between genes, correlations, and biological networks

๐Ÿ“ˆ
Data Visualization

Create plots and charts to visualize biological data patterns

๐ŸŽ‰ Excellent Progress!

You can now analyze DNA sequences, parse biological files, and make data-driven decisions in code

Next week we'll explore how to work with large datasets and discover gene relationships using Python!

Resources for DNA Pythonistas ๐Ÿ๐Ÿงฌ

โญ

Biopython

The essential Python library for biological computation

What Biopython Does

  • โ–ธParse biological file formats (FASTA, GenBank, PDB, etc.)
  • โ–ธSequence manipulation and translation
  • โ–ธBLAST searches and alignment tools
  • โ–ธAccess NCBI databases (Entrez, PubMed)
  • โ–ธPhylogenetic tree analysis

Getting Started

Install with pip:

pip install biopython

Quick example:

from Bio.Seq import Seq
dna = Seq("ATGGCCATTGTAA")
protein = dna.translate()
print(protein)  # MAIV*

๐ŸงฐOther Essential Python Libraries for Biology

NumPy & Pandas

Numerical computing and data analysis. Essential for working with gene expression data, experimental results, and large datasets.

numpy.org โ€ข pandas.pydata.org

scikit-bio

Bioinformatics library for sequence alignment, diversity analysis, and working with biological data structures.

scikit-bio.org

matplotlib & seaborn

Data visualization libraries for creating publication-quality plots, charts, and figures for your biological data.

matplotlib.org

๐Ÿ“–Learning Resources & Documentation

Online Courses & Tutorials

  • โ–ธ
    Python for Biologists

    pythonforbiologists.com - Comprehensive tutorials

  • โ–ธ
    Rosalind

    rosalind.info - Learn bioinformatics through problem solving

  • โ–ธ
    BioPython Tutorial

    Official tutorial with real-world examples

Databases & APIs

  • โ–ธ
    NCBI Entrez

    Access GenBank, PubMed, and other NCBI databases via Python

  • โ–ธ
    UniProt

    Protein sequence and functional information database

  • โ–ธ
    Ensembl REST API

    Genomic data access through Python requests

๐Ÿ”ฌSpecialized Bioinformatics Tools

PyMOL

3D molecular visualization and analysis of protein structures

DendroPy

Phylogenetic computing library for tree analysis

pysam

Python wrapper for SAM/BAM sequencing data formats

๐Ÿ’ฌCommunity & Getting Help

๐Ÿ

Bioinformatics StackExchange

Ask questions, get answers from the community

๐Ÿ’ป

GitHub

Explore open-source bioinformatics projects

๐Ÿ“š

Python Documentation

Official Python docs - your best friend!

๐Ÿš€ You're Now Part of the DNA Pythonista Community!

These tools will empower you to tackle real biological problems with code

Start exploring Biopython today and see how much time you can save in your research!