Fast and Efficient Rmap Assembly Using the Bi-Labelled de Bruijn Graph

Fast and Efficient Rmap Assembly Using the Bi-Labelled de Bruijn Graph

Authors Kingshuk Mukherjee , Massimiliano Rossi , Leena Salmela , Christina Boucher



PDF
Thumbnail PDF

File

LIPIcs.WABI.2020.9.pdf
  • Filesize: 0.88 MB
  • 16 pages

Document Identifiers

Author Details

Kingshuk Mukherjee
  • Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, USA
Massimiliano Rossi
  • Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, USA
Leena Salmela
  • Department of Computer Science, Helsinki Institute for Information Technology, University of Helsinki, Finland
Christina Boucher
  • Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, USA

Cite As Get BibTex

Kingshuk Mukherjee, Massimiliano Rossi, Leena Salmela, and Christina Boucher. Fast and Efficient Rmap Assembly Using the Bi-Labelled de Bruijn Graph. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Leibniz International Proceedings in Informatics (LIPIcs), Volume 172, pp. 9:1-9:16, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020) https://doi.org/10.4230/LIPIcs.WABI.2020.9

Abstract

Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there exists very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary method that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics' Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (2006) and Solve by Bionano Genomics on data from three genomes - E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was the only one able to successfully run on all three genomes. The method of Valouev et al. (2006) only successfully ran on E. coli and Bionano Solve successfully ran on E. coli and human but not on the fish genome. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies.

Subject Classification

ACM Subject Classification
  • Theory of computation
  • Theory of computation → Design and analysis of algorithms
Keywords
  • optical maps
  • de Bruijn graph
  • assembly

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Thomas S. Anantharaman, Bud Mishra, and David C. Schwartz. Genomics via optical mapping iii: Contiging genomic DNA and variations (extended abstract). In ISMB-99, pages 18-27. AAAI Press, 1997. Google Scholar
  2. Anton Bankevich et al. SPAdes: A New Genome Assembly Algorithm and its Applications to Single-Cell Sequencing. J. Comput. Biol., 19(5):455-477, 2012. Google Scholar
  3. Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509-517, 1975. Google Scholar
  4. Srikar Chamala et al. Assembly and validation of the genome of the nonmodel basal angiosperm amborella. Science, 342(6165):1516-1517, 2013. Google Scholar
  5. Ping Chen, Xinyun Jing, Jian Ren, Han Cao, Pei Hao, and Xuan Li. Modelling BioNano optical data and simulation study of genome map assembly. Bioinformatics, 34(23):3966-3974, 2018. Google Scholar
  6. Deanna M. Church et al. Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol., 7(5):e1000112+, 2009. Google Scholar
  7. Yang Dong et al. Sequencing and automated whole-genome optical mapping of the genome of a domestic goat (Capra hircus). Nat. Biotechnol., 31:135-141, 2013. Google Scholar
  8. Xian Fan, Jie Xu, and Luay Nakhleh. Detecting large indels using optical map data. In Proc. of RECOMB-CG, pages 108-127, 2018. Google Scholar
  9. Ganeshkumar Ganapathy et al. De novo high-coverage sequencing and annotated assemblies of the budgerigar genome. GigaScience, 3:11, 2014. Google Scholar
  10. Alexey Gurevich, Vladislav Saveliev, Nikolay Vyahhi, and Glenn Tesler. QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29(8):1072-1075, 2013. Google Scholar
  11. Ramana M. Idury and Michael S. Waterman. A new algorithm forDNA sequence assembly. J. Comput. Biol, 2(2):291-306, 1995. Google Scholar
  12. Le Li et al. OMSV enables accurate and comprehensive identification of large structural variations from nanochannel-based single-molecule optical maps. Genome Biol., 18(1):230, 2017. Google Scholar
  13. Menglu Li et al. Towards a more accurate error model for BioNano optical maps. In Proc of ISBRA, pages 67-79, 2016. Google Scholar
  14. Paul Medvedev, Son Pham, Mark Chaisson, Glenn Tesler, and Pavel Pevzner. Paired de Bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers. J. Comput. Biol, 18, 2011. Google Scholar
  15. Giles Miclotte, Stéphane Plaisance, Stephane Rombauts, Yves Van de Peer, Pieter Audenaert, et al. OMSim: a simulator for optical map data. Bioinformatics, pages 2740-2742, 2017. Google Scholar
  16. Martin D. Muggli, Simon J. Puglisi, Roy Ronen, and Christina Boucher. Misassembly detection using paired-end sequence reads and optical mapping data. Bioinformatics, 31(12):i80-i88, 2015. Google Scholar
  17. Kingshuk Mukherjee, Bahar Alipanahi, Tamer Kahveci, Leena Salmela, and Christina Boucher. Aligning optical maps to de Bruijn graphs. Bioinformatics, 35(18):3250-3256, 2019. Google Scholar
  18. Kingshuk Mukherjee, Darshan Washimkar, Martin D. Muggli, Leena Salmela, and Christina Boucher. Error correcting optical mapping data. GigaScience, 7, 2018. Google Scholar
  19. Weihua Pan, Tao Jiang, and Stefano Lonardi. OMGS: optical map-based genome scaffolding. J. Comput. Biol, 27(4):519-533, 2020. Google Scholar
  20. Weihua Pan and Stefano Lonardi. Accurate detection of chimeric contigs via BioNano optical maps. Bioinformatics, 35(10):1760-1762, 2018. Google Scholar
  21. Yu Peng, Henry CM Leung, Siu-Ming Yiu, and Francis YL Chin. IDBA-UD: A de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28(11):1420-1428, 2012. Google Scholar
  22. Pavel A. Pevzner, Haixu Tang, and Michael S. Waterman. An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci., 98(17):9748-9753, 2001. Google Scholar
  23. Susan Reslewic et al. Whole-genome shotgun optical mapping of Rhodospirillum Rubrum. Appl. Environ. Microbiol., 71(9):5511-5522, 2005. Google Scholar
  24. David C. Schwartz, Xiaojun Li, Luis I Hernandez, Satyadarshan P. Ramnarain, Edward J. Huff, and Yu-Ker Wang. Ordered restriction maps of saccharomyces cerevisiae chromosomes constructed by optical mapping. Science, 262:110-114, 1993. Google Scholar
  25. Jennifer M. Shelton, Michelle C. Coleman, Nic Herndon, Nanyan Lu, Ernest T. Lam, Thomas Anantharaman, Palak Sheth, and Susan J. Brown. Tools and pipelines for BioNano data: molecule assembly pipeline and fasta super scaffolding tool. BMC Genomics, 16(1):734, 2015. Google Scholar
  26. Jared T. Simpson, Kim Wong, Shaun D. Jackman, Jacqueline E. Schein, Steven J. M. Jones, and Inanç Birol. ABySS: a parallel assembler for short read sequence data. Genome Res., 19(6):1117-1123, 2009. Google Scholar
  27. Brian Teague et al. High-resolution human genome structure by single-molecule analysis. Proc. Natl. Acad. Sci., 107(24):10848-10853, 2010. Google Scholar
  28. Anton Valouev et al. Alignment of optical maps. J. Comp. Biol., 13(2):442-462, 2006. Google Scholar
  29. Anton Valouev, David C. Schwartz, Shiguo Zhou, and Michael S. Waterman. An algorithm for assembly of ordered restriction maps from single dna molecules. Proc. Natl. Acad. Sci., 103(43):15770-15775, 2006. Google Scholar
  30. Daniel R. Zerbino and Ewan Birney. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 18(5):821-829, 2008. Google Scholar
  31. Shiguo Zhou et al. A whole-genome shotgun optical map of Yersinia pestis strain KIM. Appl. Environ. Microbiol., 68(12):6321-6331, 2002. Google Scholar
  32. Shiguo Zhou et al. Shotgun optical mapping of the entire leishmania majorriedlin genome. Mol. Biochem. Parasitol., 138(1):97-106, 2004. Google Scholar
  33. Shiguo Zhou et al. Validation of rice genome sequence by optical mapping. BMC Genom., 8(1):278, 2007. Google Scholar
  34. Shiguo Zhou et al. A Single Molecule Scaffold for the Maize Genome. PLoS Genetics, 5:e1000711, 2009. Google Scholar
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail