Error Correction in Nanopore Reads for de novo Genomic Assembly | SpringerLink
Skip to main content

Error Correction in Nanopore Reads for de novo Genomic Assembly

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2020 (ICCSA 2020)

Abstract

The purpose of genome sequencing is to determine the DNA sequence of a given organism. Current sequencing technologies can be classified by the type of output data. Whereas Nanopore technology generates long reads with high error rates, short read technologies - such as Illumina sequencing - generate shorter reads but with low error rate. Since de novo genome assembly of sequencing reads is defined as a NP-hard problem, it remains as one of the major challenges for defining reference genomes of different species. This paper aims to improve the quality of reads obtained through Oxford Nanopore Technologies (ONT). We developed an algorithm to associate the reads obtained from Illumina with the ones obtained with Nanopore. Low accuracy ONT reads were corrected with the high quality Illumina reads to achieve an improved sequencing data. The inclusion of this algorithm as a preprocessing step resulted in improved coverage, contig length, and mismatch rate when performing de novo genome assembly of a bacterial genome with well known tools.

This work has been partially supported by the MINEDUC under the project MAG1895, and by Conicyt under the project Fondecyt 1180882.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cadena-Zamudio, J., Martínez-Peña, M., Guzmán-Rodríguez, L., Arteaga-Garibay, R., De Morelos, T.: Aplicación de secuenciación masiva para el estudio y exploración de diversidad microbiana y su aprovechamiento biotecnológico. Agroproductividad 9(2), 70–83 (2016)

    Google Scholar 

  2. Gurevich, A., Saveliev, V., Vyahhi, N., Tesler, G.: Quast: quality assessment tool for genome assemblies. Bioinformatics 29(8), 1072–1075 (2013)

    Article  Google Scholar 

  3. Jain, M., Olsen, H., Paten, B., Akeson, M.: The oxford nanopore minion: delivery of nanopore sequencing to the genomics community. Genome Biol. 17 (2016). https://doi.org/10.1186/s13059-016-1103-0

  4. Kawulok, J.: Approximate string matching for searching DNA sequences. Int. J. Biosci. Biochem. Bioinform. 3, 145–148 (2013). https://doi.org/10.7763/IJBBB.2013.V3.183

    Article  Google Scholar 

  5. Kolmogorov, M., Yuan, J., Lin, Y., Pevzner, P.A.: Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37(5), 540–546 (2019)

    Article  Google Scholar 

  6. Land, M., et al.: Insights from 20 years of bacterial genome sequencing. Functi. Integr. Genomics 15(2), 141–161 (2015)

    Article  Google Scholar 

  7. Li, H.: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997 (2013)

  8. Li, H.: Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32(14), 2103–2110 (2016). https://doi.org/10.1093/bioinformatics/btw152

    Article  Google Scholar 

  9. Li, H.: Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34(18), 3094–3100 (2018)

    Article  Google Scholar 

  10. Li, H., et al.: The sequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009)

    Article  Google Scholar 

  11. Milne, I., et al.: Using tablet for visual exploration of second-generation sequencing data. Briefi. Bioinformatics 14(2), 193–202 (2013)

    Article  Google Scholar 

  12. Morozova, O., Marra, M.A.: Applications of next-generation sequencing technologies in functional genomics. Genomics 92(5), 255–264 (2008)

    Article  Google Scholar 

  13. Okonechnikov, K., Conesa, A., García-Alcalde, F.: Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32(2), 292–294 (2016)

    Google Scholar 

  14. Phillippy, A.M.: New advances in sequence assembly (2017)

    Google Scholar 

  15. Rodríguez-Santiago, B., Armengol, L.: Tecnologías de secuenciación de nueva generación en diagnóstico genético pre-y postnatal. Diagnóstico Prenatal 23(2), 56–66 (2012)

    Article  Google Scholar 

  16. Ruan, J.: SMARTdenovo: Ultra-fast de novo assembler using long noisy reads (2015)

    Google Scholar 

  17. Salmela, L., Rivals, E.: LoRDEC: accurate and efficient long read error correction. Bioinformatics 30(24), 3506–3514 (2014)

    Article  Google Scholar 

  18. Tubbs, R., Stoler, M.: Cell and Tissue Based Molecular Pathology. ClinicalKey 2012, Churchill Livingstone/Elsevier (2009)

    Google Scholar 

  19. Zhang, H., Jain, C., Aluru, S.: A comprehensive evaluation of long read error correction methods. BioRxiv p. 519330 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roberto Uribe-Paredes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aldridge-Águila, J., Álvarez-Saravia, D., Navarrete, M., Uribe-Paredes, R. (2020). Error Correction in Nanopore Reads for de novo Genomic Assembly. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science(), vol 12253. Springer, Cham. https://doi.org/10.1007/978-3-030-58814-4_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58814-4_63

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58813-7

  • Online ISBN: 978-3-030-58814-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics