Abstract
The purpose of genome sequencing is to determine the DNA sequence of a given organism. Current sequencing technologies can be classified by the type of output data. Whereas Nanopore technology generates long reads with high error rates, short read technologies - such as Illumina sequencing - generate shorter reads but with low error rate. Since de novo genome assembly of sequencing reads is defined as a NP-hard problem, it remains as one of the major challenges for defining reference genomes of different species. This paper aims to improve the quality of reads obtained through Oxford Nanopore Technologies (ONT). We developed an algorithm to associate the reads obtained from Illumina with the ones obtained with Nanopore. Low accuracy ONT reads were corrected with the high quality Illumina reads to achieve an improved sequencing data. The inclusion of this algorithm as a preprocessing step resulted in improved coverage, contig length, and mismatch rate when performing de novo genome assembly of a bacterial genome with well known tools.
This work has been partially supported by the MINEDUC under the project MAG1895, and by Conicyt under the project Fondecyt 1180882.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cadena-Zamudio, J., Martínez-Peña, M., Guzmán-Rodríguez, L., Arteaga-Garibay, R., De Morelos, T.: Aplicación de secuenciación masiva para el estudio y exploración de diversidad microbiana y su aprovechamiento biotecnológico. Agroproductividad 9(2), 70–83 (2016)
Gurevich, A., Saveliev, V., Vyahhi, N., Tesler, G.: Quast: quality assessment tool for genome assemblies. Bioinformatics 29(8), 1072–1075 (2013)
Jain, M., Olsen, H., Paten, B., Akeson, M.: The oxford nanopore minion: delivery of nanopore sequencing to the genomics community. Genome Biol. 17 (2016). https://doi.org/10.1186/s13059-016-1103-0
Kawulok, J.: Approximate string matching for searching DNA sequences. Int. J. Biosci. Biochem. Bioinform. 3, 145–148 (2013). https://doi.org/10.7763/IJBBB.2013.V3.183
Kolmogorov, M., Yuan, J., Lin, Y., Pevzner, P.A.: Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37(5), 540–546 (2019)
Land, M., et al.: Insights from 20 years of bacterial genome sequencing. Functi. Integr. Genomics 15(2), 141–161 (2015)
Li, H.: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997 (2013)
Li, H.: Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32(14), 2103–2110 (2016). https://doi.org/10.1093/bioinformatics/btw152
Li, H.: Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34(18), 3094–3100 (2018)
Li, H., et al.: The sequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009)
Milne, I., et al.: Using tablet for visual exploration of second-generation sequencing data. Briefi. Bioinformatics 14(2), 193–202 (2013)
Morozova, O., Marra, M.A.: Applications of next-generation sequencing technologies in functional genomics. Genomics 92(5), 255–264 (2008)
Okonechnikov, K., Conesa, A., García-Alcalde, F.: Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32(2), 292–294 (2016)
Phillippy, A.M.: New advances in sequence assembly (2017)
Rodríguez-Santiago, B., Armengol, L.: Tecnologías de secuenciación de nueva generación en diagnóstico genético pre-y postnatal. Diagnóstico Prenatal 23(2), 56–66 (2012)
Ruan, J.: SMARTdenovo: Ultra-fast de novo assembler using long noisy reads (2015)
Salmela, L., Rivals, E.: LoRDEC: accurate and efficient long read error correction. Bioinformatics 30(24), 3506–3514 (2014)
Tubbs, R., Stoler, M.: Cell and Tissue Based Molecular Pathology. ClinicalKey 2012, Churchill Livingstone/Elsevier (2009)
Zhang, H., Jain, C., Aluru, S.: A comprehensive evaluation of long read error correction methods. BioRxiv p. 519330 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Aldridge-Águila, J., Álvarez-Saravia, D., Navarrete, M., Uribe-Paredes, R. (2020). Error Correction in Nanopore Reads for de novo Genomic Assembly. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science(), vol 12253. Springer, Cham. https://doi.org/10.1007/978-3-030-58814-4_63
Download citation
DOI: https://doi.org/10.1007/978-3-030-58814-4_63
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58813-7
Online ISBN: 978-3-030-58814-4
eBook Packages: Computer ScienceComputer Science (R0)