Graph Traversal Edit Distance and Extensions - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar;27(3):317-329.
doi: 10.1089/cmb.2019.0511. Epub 2020 Feb 13.

Graph Traversal Edit Distance and Extensions

Affiliations

Graph Traversal Edit Distance and Extensions

Ali Ebrahimpour Boroojeny et al. J Comput Biol. 2020 Mar.

Abstract

Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space, which renders graph kernels important for the aforementioned applications. In this article, we give a new graph kernel, which we call graph traversal edit distance (GTED). We introduce the GTED problem and give the first polynomial time algorithm for it. Informally, the GTED is the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. Also, GTED is motivated by and provides the first mathematical formalism for sequence co-assembly and de novo variation detection in bioinformatics. We demonstrate that GTED admits a polynomial time algorithm using a linear program in the graph product space that is guaranteed to yield an integer solution. To the best of our knowledge, this is the first approach to this problem. We also give a linear programming relaxation algorithm for a lower bound on GTED. We use GTED as a graph kernel and evaluate it by computing the accuracy of a support vector machine (SVM) classifier on a few data sets in the literature. Our results suggest that our kernel outperforms many of the common graph kernels in the tested data sets. As a second set of experiments, we successfully cluster viral genomes using GTED on their assembly graphs obtained from de novo assembly of next-generation sequencing reads.

Keywords: assembly graph; clustering genera; co-assembly; de novo variation detection; graph comparison; graph kernel.

PubMed Disclaimer

Conflict of interest statement

The authors declare they have no competing financial interests.

Figures

FIG. 1.
FIG. 1.
Edge-labeled Eulerian graph. An edge-labeled Eulerian graph A=(V,E,M,L,{A,C,G,T}) obtained from the k=4 de Bruijn graph G=(V,E) for the circular sequence ACAGACAT (Jones and Pevzner, 2004). Vertices, V, correspond to (k1)-mers and edges correspond to k-mers. In this case, all the edges have multiplicity one, that is, M1. Edge labels, L, show the kth nucleotide in the associated k-mers.
FIG. 2.
FIG. 2.
Conventional alignment graph. The alignment graph for AC versus AGC. Those edges that correspond to matches are in dashed lines (cost of a match is often 0). Solid lines show substitutions and indels, which usually have a positive cost. The edit distance is the shortest distance from s to t in this graph (shown in blue).

Similar articles

Cited by

References

    1. Borgwardt K.M., and Kriegel H.P.. 2005. Shortest-path kernels on graphs. In Fifth IEEE International Conference on Data Mining (ICDM’05), Houston, TX. p. 8
    1. Borgwardt K.M., Ong C.S., Schönauer S., et al. . 2005. Protein function prediction via graph kernels. Bioinformatics 21, 47–56 - PubMed
    1. Boroojeny A.E., Shrestha A., Sharifi-Zarchi A., et al. . 2018. GTED: Graph traversal edit distance. In International Conference on Research in Computational Molecular Biology. Springer, Paris, France. pp. 37–53
    1. Debnath A.K., Lopez de Compadre R.L., Debnath G., et al. . 1991. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. J. Med. Chem. 34, 786–797 - PubMed
    1. Dey T., Hirani A., and Krishnamoorthy B.. 2011. Optimal homologous cycles, total unimodularity, and linear programming. SIAM J. Comput. 40, 1026–1044