Abstract
The lack of well-structured annotations in a growing amount of RNA expression data complicates data interoperability and reusability. Commonly used text mining methods extract annotations from existing unstructured data descriptions and often provide inaccurate output that requires manual curation. Automatic data-based augmentation (generation of annotations on the base of expression data) can considerably improve the annotation quality and has not been well-studied. We formulate an automatic augmentation of small RNA-seq expression data as a classification problem and investigate deep learning (DL) and random forest (RF) approaches to solve it. We generate tissue and sex annotations from small RNA-seq expression data for tissues and cell lines of homo sapiens. We validate our approach on 4243 annotated small RNA-seq samples from the Small RNA Expression Atlas (SEA) database. The average prediction accuracy for tissue groups is 98% (DL), for tissues - 96.5% (DL), and for sex - 77% (DL). The “one dataset out” average accuracy for tissue group prediction is 83% (DL) and 59% (RF). On average, DL provides better results as compared to RF, and considerably improves classification performance for ‘unseen’ datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Backes, C., Khaleeq, Q.T., et al.: miEAA: microRNA enrichment analysis and annotation. Nucleic Acids Res. 44(W1), W110–W116 (2016)
Ellis, S., et al.: Improving the value of public RNA-SEQ expression data by phenotype prediction. Nucleic Acids Res. 46(9), e54 (2018)
Gene expression omnibus. https://www.ncbi.nlm.nih.gov/geo/
Guo, L., et al.: miRNA and mRNA expression analysis reveals potential sex-biased miRNA expression. Sci. Rep. 7, 39812 (2017)
Guo, Z., Maki, M., et al.: Genome-wide survey of tissue-specific microRNA and transcription factor regulatory networks in 12 tissues. Sci. Rep. 4, 5150 (2014)
Hadley, D., Pan, J., et al.: Precision annotation of digital samples in NCBI’s gene expression omnibus. Sci. Data 4, 170125 (2017)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436 (2015)
Li, Y., et al.: Deep learning in bioinformatics: introduction, application, and perspective in big data era. bioRxiv (2019)
Madan, S., Fiosins, M., et al.: A semantic data integration methodology for translational neurodegenerative disease research. Figshare (2018)
Rahman, R.U., Sattar, A., Fiosins, M., et al.: Sea: the small RNA expression atlas. bioRxiv (2017). https://www.biorxiv.org/content/early/2017/08/04/133199
Rahman, R.U., et al.: Oasis 2: improved online analysis of small RNA-seq data. BMC Bioinform. 19, 54 (2018)
Simon, L., et al.: Human platelet microRNA-mRNA networks associated with age and gender revealed by integrated plateletomics. Blood 123, e37–e45 (2014)
Statnikov, A., Wang, L., Aliferis, C.F.: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinform. 9, 319 (2008)
Sun, Y., Koo, S., et al.: Development of a micro-array to detect human and mouse microRNAs and characterization of expression in human organs. Nucleic Acids Res. 32(22), e188 (2004)
Webb, S.: Deep learning for biology. Nature 554, 555–557 (2018)
Wilkinson, M.D., et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016)
Xiao, T., et al.: Learning from massive noisy labeled data for image classification. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2691–2699 (2015)
Acknowledgements
The research was supported by the German Federal Ministry of Education and Research (BMBF), project Integrative Data Semantics for Neurodegenerative research (031L0029); by German Research Foundation (DFG), project Quantitative Synaptology (SFB 1286 Z2) and by Volkswagen Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Fiosina, J., Fiosins, M., Bonn, S. (2019). Deep Learning and Random Forest-Based Augmentation of sRNA Expression Profiles. In: Cai, Z., Skums, P., Li, M. (eds) Bioinformatics Research and Applications. ISBRA 2019. Lecture Notes in Computer Science(), vol 11490. Springer, Cham. https://doi.org/10.1007/978-3-030-20242-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-20242-2_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20241-5
Online ISBN: 978-3-030-20242-2
eBook Packages: Computer ScienceComputer Science (R0)