Intro

The vast majority of Mycobacterium tuberculosis NGS data published so far stems from the Illumina platform. As of 25 March 2015, the European public repository (ENA) held only 40/11,027 runs associated with this taxon that are measured with other techniques (see here for new numbers). Thus, we started with Illumina covering 99.6% of the demand, already. Meanwhile, PhyResSE can also process Ion Torrent data (platform selection in the top right corner of uploaded data list).

Handling

Please select your platform (Illumina or Ion Torrent, on Upload page right from "Process files"), before uploading the data. Data stemming from Ion Torrent will not upload (failing the FASTQ validation) as Illumina data.

Computational Steps

(Warning: Wee little nerdlings only. Log-likelyhood increases with the age of a tree? Best stop reading here.)



 Older Illumina 
 ==============
 
 Only one difference to Illumina (default): 

 * If Fastqc-generated file fastqc_data.txt reports any encoding other than
   "Sanger / Illumina 1.9":

   mv ${dir}${file} `echo ${dir}.${file}|sed "s/fastq/NON_PHRED33_ORIGINAL_FASTQ/"`
   java -jar trimmomatic-0.33.jar SE `echo ${dir}.${file}|sed "s/fastq/NON_PHRED33_ORIGINAL_FASTQ/"` ${dir}${file} TOPHRED33
                 


 ION TORRENT
 ==========
 
 Only three differences to Illumina (default): 

 * Upload: fastQValidator needs to be run with --minReadLen 1
           Ion Torrent data fail the default test. Because fastQValidator is called 
           during the upload procedure, the platform needs to be selected before upload. 
 
 (* Mapping: IndelRealigner needs to be run with --defaultBaseQualities 12
             because some qualities are missing. They are conservatively assiged
             a bad quality (12). This is acutally no real difference, because 
             this option was added to the metafile governing all platforms, as 
             displayed here. For other (Illumina) 
             data, however, this has no effect because here qualities are never 
             missing (always defined).)
              
 * Variants: Differing from here, the GATK UnifiedGenotyper 
             is called with -glm SNP instead of -glm BOTH to exclude all indels. This is 
             not intended to be written in stone, comments / better recipies welcome.