Genome Sciences Centre, BC Cancer Agency
Vancouver, Canada
@sjackman
github.com/sjackman
sjackman.ca
Inanc Birol
Joerg Bohlmann
Jenny Bryan
Steven Hallam
rOpenSci Unconference
RECOMB 2015
STAT 545 Data wrangling, exploration, and analysis with R
Automating data analysis pipelines
BIOF 520 Problem-Based Learning In Bioinformatics
Genomic epidemiology
Organellar Genomes of White Spruce (Picea glauca): Assembly and Annotation
for the assembly of long reads
Estimating the distance between two contigs
Research in Computational Molecular Biology
Warsaw, Poland · 2015 April 10–15
Order and orient contigs to build scaffolds using…
Genome Sciences Centre, BC Cancer Agency
Vancouver, Canada
@sjackman
github.com/sjackman
sjackman.ca
Inanc Birol | Joerg Bohlmann
Steven Hallam | Jenny Bryan
RECOMB 2015 | rOpenSci Unconference
UniqTag
White spruce organelles | ABySS 2.0 | DistanceEst
BIOF 520 | STAT 545
Open, reproducible science
Homebrew | Linuxbrew | Homebrew-science
BLASR | BWA-MEM | Celera Assembler
DALIGNER | Dazzler | Falcon | HGAP
LAST | MHAP | Nanocorr | Nanocorrect
Nanopolish | PBcR | PBDAGCon | POA
Quiver
Technology | Read length | Error rate |
---|---|---|
Sanger | 800 bp | 0.1–1% |
454 | 700 bp | ~1% |
Illumina | 2 x 300 bp | ~0.1% |
PacBio | 8–40 kbp | ~13% |
Oxford Nanopore | 8–200 kbp | ~15% |
Find all significantly overlapping reads
Recall the consensus base of each read
Determine the order and orientation of the reads
Call the consensus base of each contig
PBDAGCon · Falcon · Dazzler · Nanocorrect
Celera Assembler · Falcon · Dazzler
Assembler | Overlap | Correct | Layout | Consensus |
---|---|---|---|---|
HGAP | BLASR | PBDAGCon | Celera | Quiver |
Falcon | DALIGNER | Falcon | Falcon | Quiver |
PBcR | MHAP | Falcon | Celera | Quiver |
Dazzler | DALIGNER | Dazzler | Dazzler | Quiver |
Nanocorr | BLAST | PBDAGCon | Celera | Celera |
Nanopolish | DALIGNER | Nanocorrect | Celera | Nanopolish |
Nanocorr Saccharomyces cerevisiae (12 Mbp)
Nanopolish Escherichia coli (5 Mbp)
Seed and extend is difficult when
one in seven bases is incorrect
de Bruijn graphs also require accurate seeds
Return to overlap, layout, consensus (OLC)