bamutils / basecall

Base/variant caller

Given a BAM file and a genomic reference, for each position covered in the
BAM file, show the reference base, and the number of A/T/C/G's and InDels.
This can also be used to call variants.

You can also optionally filter out all bases whose quality score is below a
threshold, bases that aren't covered by enough reads, bases that have no
variation compared to reference, or bases whose variation is too low.

The output is a tab-delimited file that contains the following for each base:

chromosome
position (1-based)
reference base
# reads that contain this base
Consensus call
Minor call
Average mappings (number of mappings each read covering this base has)
Alternative allele frequency (optional)
Entropy
# A calls
# C calls
# G calls
# T calls
# deletions
# gaps
# inserts

If -altfreq is applied, an alternative allele percentage is calculated.
            minor - background
     --------------------------------
  (major - background) + (minor - background)

This is in lieu of using a more robust model, such as a Baysian model like
what was used in Li, et al 2009, Genome Res.

If -showstrand is applied, a minor strand percentage is also calculated.p This
is calculated as:
    pct = (# reads with base on plus/ # reads with base total)
    if pct > 0.5,
        pct = 1-pct

Entropy is sum(a..t) { p log2 p } where p = freq(+pseudocount) / genomic freq.
pseudo count = genomic freq * sqrt(N)

We use the following genomic frequencies: A 0.3, C 0.2, G 0.2, T 0.3


Usage: bamutils basecall {opts} in.bam {chrom:start-end}

Options:
-ref   fname   Include reference basecalls from this file
-qual  val     Minimum base-quality level to use in calculations
               (numeric, Sanger scale) (default 0)

-count val     Report only bases with this minimum number of read-coverage
               (matches, inserts, deletions counted) (default 0)

-mask  val     The bitmask to use for filtering reads by flag
               (default 1540 - see SAM format for details)

-minorpct pct  Require a minor call to be at least [pct] of the total count
               (0.0 -> 1.0, default 0.04)

-altfreq       Calculate alternative allele frequency

-showgaps      Report gaps/splice-junctions in RNA-seq data

-showstrand    Show the minor-strand percentages for each call
               (0-0.5 only shows the minor strand %)

-bed fname     Only output positions that are present in this BED file
               (*must* be sorted and reduced with the -nostrand option)

-variants      Only output positions that differ from reference