White Spruce Organelles Genome Biology and Evolution
2016 doi.org/f8bxck
UniqTag PLOS ONE
2015 doi.org/c3m3
Short Read Genome Assembly
ABySS 1.0 (2009) was the first to assemble
a human genome from short reads (42 bp!)
de Bruijn graph assembler
Stored k-mers in a hash table
Distributed the hash table over many machines
Used MPI to aggregate sufficient memory
Assembles large genomes
Challenges
Uses lots of memory
Message passing is slow
Network communication is really slow
Solution
A memory-efficient data structure
reduces memory usage
Fitting entire graph in a single machine
eliminates network communication
Using shared memory (OpenMP)
eliminates message passing (MPI)
ABySS 2.0 reduces the memory
usage of ABySS by ten fold.
Spruce genome assemblies
ABySS
1.3.5
2.0.0
Spruce species
Interior
Sitka
Machines
115
1
RAM (GB)
4,300
500
CPU cores
1,380
64
CPU time*
6.0 years
3.2 years
* Time of unitig assembly without scaffolding
ABySS 2.0 Conclusions
ABySS 2.0 reduces memory usage by 10 fold
from 418 GB to 34 GB for human
from 4,300 GB to 500 GB for spruce
High-throughput short-read sequencing
combined with large molecule scaffolding
such as 10X Genomics, BioNano, Hi-C
permits cost effective assembly of large genomes
One first-author manuscript in preparation
(Sitka spruce mitochondrion)
ABySS has been cited over 2,700 times!
First-author Publications
Tigmint: correcting assembly errors using linked reads from large molecules SD Jackman, L Coombe, J Chu, RL Warren, BP Vandervalk, … BMC Bioinformatics 2018
ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter SD Jackman*, BP Vandervalk*, H Mohamadi, J Chu, S Yeo, SA Hammond, … Genome Research 2017
Organellar genomes of white spruce (Picea glauca): assembly and annotation SD Jackman, RL Warren, EA Gibb, BP Vandervalk, H Mohamadi, J Chu, … Genome Biology and Evolution 2015
UniqTag: content-derived unique and stable identifiers for gene annotation SD Jackman, J Bohlmann, I Birol PLOS ONE 2015
Selected Publications
Assembly of the complete Sitka spruce chloroplast… L Coombe, RL Warren, SD Jackman, C Yang, BP Vandervalk, …, I Birol PloS one 2016
Spaced seed data structures for de novo assembly
I Birol, J Chu, H Mohamadi, SD Jackman, K Raghavan, …, RL Warren International journal of genomics 2015
Konnector v2.0: pseudo-long reads from PE sequencing
BP Vandervalk, C Yang, Z Xue, K Raghavan, J Chu, H Mohamadi, SD Jackman, …, I Birol BMC medical genomics 2015
Sealer: a scalable gap-closing application…
D Paulino, RL Warren, BP Vandervalk, A Raymond, SD Jackman, I Birol BMC Bioinformatics 2015
On the representation of de Bruijn graphs
R Chikhi, A Limasset, SD Jackman, JT Simpson, P Medvedev Journal of Computational Biology 2015
Improved white spruce (Picea glauca) genome…
RL Warren, CI Keeling, MMS Yuen, A Raymond, GA Taylor, …, J Bohlmann The Plant Journal 2015
Assembling the 20Gb white spruce genome…
I Birol, A Raymond, SD Jackman, S Pleasance, R Coope, …, SJM Jones Bioinformatics 2013