For M. tuberculosis, typically one to four thousand variants are called. They are provided in VCF format as well as in a table also providing information on amino acid changes and what is known in terms of association with genotype or resistance.


Most variants will be SNPs. When located within a CDS, the according gene will appear in column "Region" and there will be an "AA exchange" and a "PAM1". The Point Accepted Mutation 1 (PAM1) lists the probability (multiplied with 10,000 for clarity) for the particular aa exchange to occur, given that 1% of the aa are changed (99% similarity, i.e. for very similar proteins). In practice, transitions between aa equivalent in charge and size are more likely whereas a transition to a most dissimilar aa will yield a small score or even a zero (e.g. Arg→Asp).

The table lists all detected variants, each genome position being represented by one line. If multiple alleles have been called (type=MUL), the sample column lists all nucleotides observed, delimited by commas. For the time being, amino acid exchanges as well as PAM1 probabilities reflect only the main allele which is listed first.

However, if a variant position is located within intersecting CDS, more than one region, aa transition and PAM1 probability are provided, separated by semicolons.

in the "Region" column header adds information about start:stop(s), product(s), and type(s) (e.g. tRNA or CDS).

In column AA Exchance, potential start codons are appended an s in round brackets. Mutations that create or abolish a potential start are not considered silent, even if the amino acid does not change.


Computational Steps

SAMtools v0.1.19
picard-tools v1.94
Genome Analysis TK 2.5-2-gf57256b