For M. tuberculosis, typically
one to four thousand variants are called. They are provided in VCF format as well
as in a table also providing information on amino acid changes and what is
known in terms of association with genotype or resistance.
Most variants will be SNPs. When located within a CDS, the according gene will appear
in column "Region" and there will be an "AA exchange" and a
"PAM1". The
Point Accepted Mutation 1 (
PAM1) lists the probability (multiplied with 10,000 for
clarity) for the particular aa exchange to occur, given that 1% of the aa are
changed (99% similarity, i.e. for very similar proteins). In practice,
transitions between aa equivalent in charge and size are more likely whereas
a transition to a most dissimilar aa will yield a small score or even a
zero (e.g. Arg→Asp).
The table lists all detected variants,
each genome position being represented by one line.
If multiple alleles have been called (type=MUL), the sample column
lists all nucleotides observed, delimited by commas. For the time being,
amino acid exchanges as well as PAM1 probabilities reflect only the main
allele
which is listed
first.
However, if a variant position is located within intersecting CDS, more than
one
region, aa transition and PAM1 probability are provided, separated by
semicolons.
in the "Region" column header adds
information about start:stop(s), product(s), and type(s) (e.g. tRNA or CDS).
In column AA Exchance, potential start codons are appended an s in round brackets. Mutations that
create or abolish a potential start are not considered silent, even if
the amino acid does not change.
Export
- WYSIWYG
Under Windows (independent on which browser you use), you can always
copy/paste or drag/drop (like this:
http://www.mrkent.com/tools/converter/) into an Excel spreadsheet. Whole
pages can also be exported to Excel by File/Save as (Filter: all files,
manually change the file extension from e.g. .bam to .xls). Opening this
file,
Windows will complain about the format differing from the one specified by
its extension (say "yes") and that it cannot find the .css style sheet (say
"Ok"), but afterwards right smartly transform it into a slightly contorted
what-you-see-is-what-you-get sort of version of the web page. It comprises
format features and hyperlinks but looks horrible. Transform (save as, that
is)
to CSV and back in order to extract the "pure data" (unformatted table
content without further adornment).
Unfortunately, directly saving the pages as CSV or trying above procedures
under Linux will yield a wait page's text instead of table data, which is
why we provide an alternative way to extract the table contents:
- Pure Data
(table contents, only)
The export "buttons" (hyperlinks actually) on the Variant, Genotype, and
Resistance pages provide a shortcut to export the unformatted table contents
into
any spreadsheet program. At the moment, this neither works for IE nor
Konqueror but
nicely (regardless of the OS) for e.g.
- Safari (save appearing page as .csv),
- Chrome (click on the downloaded file in the lower left corner),
- Opera (directly open in Openoffice or Excel), and
- Firefox (directly open in Openoffice or Excel).
In order not to encounter any strange-looking special characters, the spreadsheet should
be imported as UTF-8. Columns are separated by commas, text is
flanked by inverted commas.
(
Warning: Snobby bioinformaticians only. In case
you are positive the process of ordaining
clergy is the only meaning of ordination, this chapter may be not for you.)
Versions
SAMtools v0.1.19
picard-tools v1.94
Genome Analysis TK 2.5-2-gf57256b