BWA-MEM

BWA-MEM (Burrows-Wheeler Aligner - Maximal Exact Match) is a fast and accurate alignment algorithm for mapping sequencing reads (70 bp to 1 Mbp) to a reference genome. It is the recommended algorithm from the BWA suite for most applications, particularly for Illumina reads ≥70 bp.

Key Features: * Fast alignment using BWT (Burrows-Wheeler Transform) indexing * Handles reads from 70 bp to several Mbp (long reads, PacBio, Nanopore) * Supports split alignments (chimeric reads, structural variants) * Efficiently handles sequencing errors and polymorphisms * Compatible with paired-end and single-end data * Generates SAM/BAM output with alignment quality scores

Typical Workflow:

Step 1: Index the reference genome (one-time setup):

bwa index -p ref_index reference.fasta

This creates index files (.amb, .ann, .bwt, .pac, .sa) that enable fast searching. The index only needs to be built once per reference genome.

Step 2: Align paired-end reads:

bwa mem \
    -t 8 \                          # use 8 threads
    -M \                            # mark shorter split hits as secondary (for Picard compatibility)
    -R '@RG\tID:sample1\tSM:sample1\tPL:ILLUMINA\tLB:lib1' \  # read group info
    ref_index \                     # reference index prefix
    sample_R1.fastq.gz \            # forward reads
    sample_R2.fastq.gz \            # reverse reads
| samtools view -bS - \             # convert SAM to BAM
| samtools sort -@ 4 -o sample.sorted.bam -  # sort by coordinate

Step 3: Index the BAM file:

samtools index sample.sorted.bam

Why use BWA-MEM over BWA-ALN? BWA-MEM is faster and more accurate than the older BWA-ALN algorithm, especially for reads ≥70 bp. It uses a different seeding strategy based on maximal exact matches (MEMs) that allows it to handle longer reads and tolerate more sequencing errors. BWA-MEM also natively supports split alignments (chimeric reads), making it suitable for detecting structural variants and mapping RNA-seq reads that span exon junctions (though dedicated spliced aligners like STAR are preferred for RNA-seq).

Read Group (@RG) Tags: The -R parameter adds read group information to the BAM file, which is essential for downstream analysis with tools like GATK. The read group tags include: - ID: Unique identifier for the read group (often flowcell.lane) - SM: Sample name (biological sample identifier) - PL: Platform (e.g., ILLUMINA, PACBIO) - LB: Library identifier (useful when multiple libraries from the same sample) This metadata enables multi-sample variant calling and helps track data provenance.

Alignment Quality and MAPQ Scores: BWA-MEM assigns a mapping quality (MAPQ) score to each alignment, indicating the probability that the alignment is incorrect. MAPQ = 60 means the alignment has a 1/1,000,000 chance of being wrong (P = 10^(-60/10)). A MAPQ ≥ 30 is generally considered high-quality. Reads with multiple equally good alignments receive MAPQ = 0, indicating ambiguous mapping.

Animation

These animations were created with Manim Community. Source scripts are in tools/animations/.

A visual walkthrough of how BWA-MEM aligns a read to a reference:

  1. A read is shown with three Maximal Exact Matches (MEMs) highlighted on the reference
  2. MEMs are chained into a colinear alignment path
  3. Gaps between MEMs are filled using Smith-Waterman local extension
  4. The final alignment is emitted as a SAM record with CIGAR string

Coming soon: Upload the rendered video to YouTube and replace this placeholder with .

To render locally:

cd tools/animations
manim -pqh --media_dir ~/Desktop/manim_animations bwamem_conceptual.py BwamemConceptual

A deeper dive into the Burrows-Wheeler Transform and FM-index:

  1. BWT construction — step-by-step rotation and sorting of “BANANA$” to build the BWT string
  2. FM-index backward search — querying “ANA” right-to-left through the BWT to find matching rows
  3. Suffix Array lookup — converting FM-index row ranges to genome positions
  4. MEM extension — growing a seed match left and right until a mismatch is hit

Coming soon: Upload the rendered video to YouTube and replace this placeholder with .

To render locally:

cd tools/animations
manim -pqh --media_dir ~/Desktop/manim_animations bwamem_stepbystep.py BwamemStepByStep
Back to top