Dinucleotide distance histograms for fast detection of rRNA in metatranscriptomic sequences

Dinucleotide distance histograms for fast detection of rRNA in metatranscriptomic sequences

Authors Heiner Klingenberg, Robin Martinjak, Frank Oliver Glöckner, Rolf Daniel, Thomas Lingner, Peter Meinicke



PDF
Thumbnail PDF

File

OASIcs.GCB.2013.80.pdf
  • Filesize: 497 kB
  • 10 pages

Document Identifiers

Author Details

Heiner Klingenberg
Robin Martinjak
Frank Oliver Glöckner
Rolf Daniel
Thomas Lingner
Peter Meinicke

Cite As Get BibTex

Heiner Klingenberg, Robin Martinjak, Frank Oliver Glöckner, Rolf Daniel, Thomas Lingner, and Peter Meinicke. Dinucleotide distance histograms for fast detection of rRNA in metatranscriptomic sequences. In German Conference on Bioinformatics 2013. Open Access Series in Informatics (OASIcs), Volume 34, pp. 80-89, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013) https://doi.org/10.4230/OASIcs.GCB.2013.80

Abstract

With the advent of metatranscriptomics it has now become possible to study the dynamics of microbial communities. The analysis of environmental RNA-Seq data implies several challenges for the development of efficient tools in bioinformatics. One of the first steps in the computational analysis of metatranscriptomic sequencing reads requires the separation of rRNA and mRNA fragments to ensure that only protein coding sequences are actually used in a subsequent functional analysis. In the context of the rRNA filtering task it is desirable to have a broad spectrum of different methods in order to find a suitable trade-off between speed and accuracy for a particular dataset. We introduce a machine learning approach for the detection of rRNA in metatranscriptomic sequencing reads that is based on support vector machines in combination with dinucleotide distance histograms for feature representation. The results show that our SVM-based approach is at least one order of magnitude faster than any of the existing tools with only a slight degradation of the detection performance when compared to state-of-the-art alignment-based methods.

Subject Classification

Keywords
  • Metatranscriptomics
  • metagenomics
  • rRNA detection
  • distance histograms

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail