Intro

In case your session comprises more than one sample, the Progress page (linked by "Process files" on the initial PhyResSE "Upload" page) will also provide exports. They are displayed in the upper right corner only after all files are processed. Each export summarizes all samples.

In order to add exports to sessions computed before 2-6-15 (displaying "done." but no exports), re-initiate computation by uploading and deleting a mock file before clicking "Process files". This will NOT re-compute all the results but only the exports (taking about ten minutes, only).

Handling

VCF: All samples are merged into one vcf file (by VCFtools: vcf-merge).

VCF-stats (also by VCFtools: vcf-stats) provides some statistics.

VCF-tab: It is then "flattened" into tab-delimited text format (also by VCFtools: vcf-to-tab, displaying max. two alleles), and further into a

FASTA file, in which each sample is represented by one sequence ( vcf_tab_to_fasta_alignment.pl by Christina Bergey, with only one allele at each position). Both VCF-tab and FASTA comprise only SNP genome positions. Moreover, the fasta file is produced by ignoring all multi-allele positions at the moment. We currently work on decomposing mixed samples into separate (component) sequences, also estimating their shares (percentage of whole sample except contamination reads).

Tree (): Last, a maximum likelihood tree is generated from the fasta file by FastTree and rendered by jstree. The latter enables to

Variants (): All samples' variants in one comma-separated spreadsheet (import into e.g. Excel). It also contains base counts, i. e. occurances of all nucleotides reliably observed (with base quality > 13) at the particular position. For unfiltered base counts see the VCF.

Computational Steps

(Warning: Wee little nerdlings only. Log-likelyhood increases with the age of a tree? Best stop reading here.)




 # code snippet producing all exports (embedding Newick tree in HTML for visualization by java script)

 while ($file=<$ARGV[0]*.bam.flt.vcf>){ # feed all single-sample vcf files
  $res=`/usr/bin/bgzip -c $file > ${file}.gz`; print "$res\n";
  $res=`/usr/bin/tabix -p vcf ${file}.gz`; print "$res\n";
  $filecount++; $files=$files.${file}.'.gz ';
 }
 if ($filecount>1){ # merge and tree from min 2 files
  print "summarizing (exports and tree)\n"; 
  $res=`/usr/bin/vcf-merge ${files} > $ARGV[0]export.vcf`; print "$res\n";
  $res=`/usr/bin/vcf-stats $ARGV[0]export.vcf > $ARGV[0]export.stats`; print "$res\n";
  $res=`/bin/cat $ARGV[0]export.vcf | /usr/bin/vcf-to-tab > $ARGV[0]export.tab`; print "$res\n";
  $res=`/usr/bin/vcf_tab_to_fasta_alignment.pl --exclude_het --output_ref -i $ARGV[0]export.tab > $ARGV[0]export.fa`; print "$res\n";
  $res=`/usr/bin/FastTreeMP -nt -quiet $ARGV[0]export.fa`;
  $res=~s/,/,\n/g; $res=~s/\)/\n\)/g; # break into many lines for easy copy/paste
  open(Fout,"> $ARGV[0]export.html");
  print Fout <<"END";
<html>
[...]
<textarea id="nhx-ex" style="display: none">
END
  print Fout $res; 
  print Fout '</textarea></body></html>'."\n\n";
  close Fout;
 }

Versions
VCFtools v0.1.12b
tabix v0.2.5 (r1005)
vcf_tab_to_fasta_alignment.pl by Christina Bergey (Bergey CM (2012). vcf-tab-to-fasta; http://code.google.com/p/vcf-tab-to-fasta)
FastTree v2.1.8 as multi-threaded executable (+SSE +OpenMP)
jstree (no versioning found, from http://lh3lh3.users.sourceforge.net/jstree.shtml)