iVar

iVar is a computational tool designed for the analysis of viral sequencing data, particularly in the context of identifying and quantifying variants within viral populations. It is commonly used in the field of virology and genomics to process high-throughput sequencing data, allowing researchers to detect mutations, track viral evolution, and understand the dynamics of viral infections.

iVar primer trimming * it is essential to remove artefactual sequences that can bias variant calling and consensus generation. * reads that truly overlap the primer regions are trimmed, while reads that do not align to primer sites are left intact. * iVar uses the BED file to locate the primer sequences at the start and end of the aligned reads and replaces those bases with ‘S’ (soft-clip) in the BAM file. * The resulting BAM file contains reads with the primers soft-clipped, preventing the primer sequence itself from introducing false variants.

iVar Quality trimming * iVar uses a sliding window approach (Default: 4) to do quality trimming. The windows slides from the 5’ end to the 3’ end and if at any point the average base quality in the window falls below the threshold (Default: 20), the remaining read is soft clipped. * If after trimming, the length of the read is greater than the minimum length specified (Default: 30), the read is written to the new trimmed BAM file.

# Index the sorted BAM before trimming
samtools index Align/sample.sorted.bam

# Trim primers
ivar trim \
    -i Align/sample.sorted.bam \  # input sorted BAM
    -b primers.bed \              # primer coordinates BED file (generated in step 0.1)
    -e \                          # include reads that extend beyond the primer region
    -p Align/sample_trimmed       # output prefix (produces sample_trimmed.bam)

why include reads that extend beyond the primer region? Including reads that extend beyond the primer region allows iVar to trim only the portion of the read that overlaps the primer, while retaining the rest of the read sequence for downstream analysis. This is important because many reads may partially overlap the primer sites, and excluding them entirely would result in loss of valuable data. By allowing reads that extend beyond the primer region, we can maximize the amount of usable data while still effectively removing primer sequences that could bias variant calling.

iVar uses a sliding window approach (Default: 4) to do quality trimming. The windows slides from the 5’ end to the 3’ end and if at any point the average base quality in the window falls below the threshold (Default: 20), the remaining read is soft clipped. If after trimming, the length of the read is greater than the minimum length specified (Default: 30), the read is written to the new trimmed BAM file. By using a sliding window rather than a single-base check, iVar avoids being “fooled” by a single bad base in the middle of a high-quality stretch. It ensures that trimming only happens when there is a consistent trend of poor data.

Call nucleotide variants from the primer-trimmed BAM using samtools mpileup piped into iVar.

samtools mpileup \
    -aa \                            # output ALL positions, including zero-coverage sites
    -B \                             # disable BAQ (per-base alignment quality) computation
    -A \                             # include anomalous read pairs, which can be informative for variant calling in amplicon data
    -d 20000 \                       # max per-position read depth cap (20,000×), raised from default 8,000× to avoid truncating high-coverage amplicon data
    --reference KT992094.1.fasta \   # reference FASTA (required when using -aa)
    -Q 0 \                           # include bases of any quality (iVar does its own filtering), to maximize data retained for variant calling
    Align/sample.bam \
| ivar variants \
    -p iVAR_Variant/sample \         # output prefix (.tsv)
    -r KT992094.1.fasta \            # reference FASTA (for amino-acid annotation)
    -g KT992094.1.gff3               # GFF3 annotation (for codon/amino-acid annotation)

Generate a consensus FASTA sequence from the primer-trimmed BAM. * This step is optional but can be useful for downstream analyses that require a consensus sequence (e.g., lineage assignment, phylogenetics). * The consensus is generated using samtools mpileup piped into iVar consensus, which applies a minimum allele frequency threshold to call bases and masks low-coverage positions. * The parameters are set to retain as much data as possible while still producing a reliable consensus sequence. For example, the minimum allele frequency threshold is set to 60 % to call a base, which allows for some within-host diversity while still calling a clear consensus. The minimum depth threshold is set to 10×, and positions below this are masked with ‘N’ to indicate uncertainty. The quality 20 means that bases with a quality score below 20 will be ignored in the consensus calling, which helps to reduce the impact of sequencing errors on the consensus sequence. The -aa flag ensures that all positions are included in the output, even those with zero coverage, which is important for generating a full-length consensus sequence that can be used for downstream analyses. * The resulting consensus FASTA can be used for lineage assignment (e.g., with Pangolin) and phylogenetic analysis, while the accompanying quality file provides information on coverage and allele frequencies at each position.

samtools mpileup \
    -aa \           # output all positions (ensures full-length consensus)
    -A \            # include anomalous read pairs
    -d 1000000 \    # raise depth cap to 1,000,000× (default 8,000× would truncate high-coverage data)
    -Q 0 \          # include all bases regardless of quality
    Align/sample.bam \
| ivar consensus \
    -p Consensus/sample \   # output prefix (.fa and .qual.txt)
    -t 0.6 \                # minimum allele frequency to call a base (60 %)
    -m 10 \                 # minimum depth; positions below this are masked with -n character
    -n N                    # character used to mask low-coverage positions
Back to top