Graph Traversal Edit Distance and Extensions
- PMID: 32058803
- PMCID: PMC7133423
- DOI: 10.1089/cmb.2019.0511
Graph Traversal Edit Distance and Extensions
Abstract
Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space, which renders graph kernels important for the aforementioned applications. In this article, we give a new graph kernel, which we call graph traversal edit distance (GTED). We introduce the GTED problem and give the first polynomial time algorithm for it. Informally, the GTED is the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. Also, GTED is motivated by and provides the first mathematical formalism for sequence co-assembly and de novo variation detection in bioinformatics. We demonstrate that GTED admits a polynomial time algorithm using a linear program in the graph product space that is guaranteed to yield an integer solution. To the best of our knowledge, this is the first approach to this problem. We also give a linear programming relaxation algorithm for a lower bound on GTED. We use GTED as a graph kernel and evaluate it by computing the accuracy of a support vector machine (SVM) classifier on a few data sets in the literature. Our results suggest that our kernel outperforms many of the common graph kernels in the tested data sets. As a second set of experiments, we successfully cluster viral genomes using GTED on their assembly graphs obtained from de novo assembly of next-generation sequencing reads.
Keywords: assembly graph; clustering genera; co-assembly; de novo variation detection; graph comparison; graph kernel.
Conflict of interest statement
The authors declare they have no competing financial interests.
Figures
Similar articles
-
Revisiting the complexity of and algorithms for the graph traversal edit distance and its variants.Algorithms Mol Biol. 2024 Apr 29;19(1):17. doi: 10.1186/s13015-024-00262-6. Algorithms Mol Biol. 2024. PMID: 38679703 Free PMC article.
-
PyGTED: Python Application for Computing Graph Traversal Edit Distance.J Comput Biol. 2020 Mar;27(3):436-439. doi: 10.1089/cmb.2019.0510. J Comput Biol. 2020. PMID: 32160033 Free PMC article.
-
Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants.ArXiv [Preprint]. 2023 Nov 8:arXiv:2305.10577v2. ArXiv. 2023. Update in: Algorithms Mol Biol. 2024 Apr 29;19(1):17. doi: 10.1186/s13015-024-00262-6 PMID: 37292475 Free PMC article. Updated. Preprint.
-
The present and future of de novo whole-genome assembly.Brief Bioinform. 2018 Jan 1;19(1):23-40. doi: 10.1093/bib/bbw096. Brief Bioinform. 2018. PMID: 27742661 Review.
-
Graph embedding and geometric deep learning relevance to network biology and structural chemistry.Front Artif Intell. 2023 Nov 16;6:1256352. doi: 10.3389/frai.2023.1256352. eCollection 2023. Front Artif Intell. 2023. PMID: 38035201 Free PMC article. Review.
Cited by
-
Revisiting the complexity of and algorithms for the graph traversal edit distance and its variants.Algorithms Mol Biol. 2024 Apr 29;19(1):17. doi: 10.1186/s13015-024-00262-6. Algorithms Mol Biol. 2024. PMID: 38679703 Free PMC article.
-
PyGTED: Python Application for Computing Graph Traversal Edit Distance.J Comput Biol. 2020 Mar;27(3):436-439. doi: 10.1089/cmb.2019.0510. J Comput Biol. 2020. PMID: 32160033 Free PMC article.
-
The effect of genome graph expressiveness on the discrepancy between genome graph distance and string set distance.Bioinformatics. 2022 Jun 24;38(Suppl 1):i404-i412. doi: 10.1093/bioinformatics/btac264. Bioinformatics. 2022. PMID: 35758819 Free PMC article.
References
-
- Borgwardt K.M., and Kriegel H.P.. 2005. Shortest-path kernels on graphs. In Fifth IEEE International Conference on Data Mining (ICDM’05), Houston, TX. p. 8
-
- Borgwardt K.M., Ong C.S., Schönauer S., et al. . 2005. Protein function prediction via graph kernels. Bioinformatics 21, 47–56 - PubMed
-
- Boroojeny A.E., Shrestha A., Sharifi-Zarchi A., et al. . 2018. GTED: Graph traversal edit distance. In International Conference on Research in Computational Molecular Biology. Springer, Paris, France. pp. 37–53
-
- Debnath A.K., Lopez de Compadre R.L., Debnath G., et al. . 1991. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. J. Med. Chem. 34, 786–797 - PubMed
-
- Dey T., Hirani A., and Krishnamoorthy B.. 2011. Optimal homologous cycles, total unimodularity, and linear programming. SIAM J. Comput. 40, 1026–1044
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous