Genome Sciences Centre, BC Cancer Agency
Vancouver, Canada
@sjackman
github.com/sjackman
sjackman.ca
San Francisco, USA · 2015 March 26–27
Research in Computational Molecular Biology
Warsaw, Poland · 2015 April 10–15
Designed and taught two one-week modules
STAT 545 Data wrangling, exploration, and analysis with R
BIOF 520 Problem-Based Learning In Bioinformatics
Organellar Genomes of White Spruce (Picea glauca): Assembly and Annotation
for the assembly of long reads
Estimating the distance between two contigs
Technology | Read length | Error rate |
---|---|---|
Sanger | 800 bp | 0.1–1% |
454 | 700 bp | ~1% |
Illumina | 2 x 300 bp | ~0.1% |
PacBio | 8–40 kbp | ~13% |
Oxford Nanopore | 8–200 kbp | ~15% |
Find all significantly overlapping reads
Recall the consensus base of each read
Determine the order and orientation of the reads
Call the consensus base of each contig
PBDAGCon · Falcon · Dazzler · Nanocorrect
Celera Assembler · Falcon · Dazzler
Assembler | Overlap | Correct | Layout | Consensus |
---|---|---|---|---|
HGAP | BLASR | PBDAGCon | Celera | Quiver |
Falcon | DALIGNER | Falcon | Falcon | Quiver |
PBcR | MHAP | Falcon | Celera | Quiver |
Dazzler | DALIGNER | Dazzler | Dazzler | Quiver |
Nanocorr | BLAST | PBDAGCon | Celera | Celera |
Nanopolish | DALIGNER | Nanocorrect | Celera | Nanopolish |
Nanocorr Saccharomyces cerevisiae (12 Mbp)
Nanopolish Escherichia coli (5 Mbp)
Research in Computational Molecular Biology
Warsaw, Poland · 2015 April 10–15
Order and orient contigs to build scaffolds using…
Genome Sciences Centre, BC Cancer Agency
Vancouver, Canada
@sjackman
github.com/sjackman
sjackman.ca
Inanc Birol | Joerg Bohlmann
Steven Hallam | Jenny Bryan
RECOMB 2015 poster | rOpenSci Unconf
UniqTag
White spruce organelles | ABySS 2.0 | DistanceEst
BIOF 520 | STAT 545
Homebrew | Linuxbrew | Homebrew-science
BLASR | BWA-MEM | Celera Assembler
DALIGNER | Dazzler | Falcon | HGAP
LAST | MHAP | Nanocorr | Nanocorrect
Nanopolish | PBcR | PBDAGCon | POA
Quiver
PAG 2014 poster | PAG 2014 workshop
Conifer Genome Summit 2014 | ISMB 2014 poster
International HPC Summer School 2014
BIOF 520 | STAT 540
Open, reproducible science
Plant and Animal Genome XXII
San Diego, California, USA · 2014 January 10–15
International HPC Summer School 2014
Budapest, Hungary · 2014 June 1–6
Conifer Genome Summit 2014
Forêt Montmorency, Québec, Canada · 2014 June 16–18
HiTSeq and ISMB 2014
Boston, Massachusetts, USA · 2014 July 11–15
STAT 540 Statistical Methods for High Dimensional Biology
BIOF 520 Problem-Based Learning In Bioinformatics
Seed and extend is difficult when
one in seven bases is incorrect
de Bruijn graphs also require accurate seeds
Return to overlap, layout, consensus (OLC)