Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug 29;16(9):1374.
doi: 10.3390/s16091374.

Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs

Affiliations

Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs

Helena Gómez-Adorno et al. Sensors (Basel). .

Abstract

We apply the integrated syntactic graph feature extraction methodology to the task of automatic authorship detection. This graph-based representation allows integrating different levels of language description into a single structure. We extract textual patterns based on features obtained from shortest path walks over integrated syntactic graphs and apply them to determine the authors of documents. On average, our method outperforms the state of the art approaches and gives consistently high results across different corpora, unlike existing methods. Our results show that our textual patterns are useful for the task of authorship attribution.

Keywords: authorship attribution; authorship verification; integrated syntactic graphs; shortest paths walks; syntactic n-grams; textual patterns.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Dependency trees of three sentences of the target text using word_POS combination for the nodes and dependency labels for the edges.
Figure 2
Figure 2
The integrated syntactic graph for the three sentences shown in Figure 1.
Figure 3
Figure 3
Syntactic tree with the synonym expansion of a sentence: “Yes, here they come”.
Figure 4
Figure 4
Accuracy for each of the ten authors.

Similar articles

Cited by

References

    1. Mihalcea R., Radev D. Graph-Based Natural Language Processing and Information Retrieval. MIT Press; Cambridge, NY, USA: 2011.
    1. Pinto D., Gómez-Adorno H., Vilariño D., Singh V.K. A graph-based multi-level linguistic representation for document understanding. Pattern Recognit. Lett. 2014;41:93–102. doi: 10.1016/j.patrec.2013.12.004. - DOI
    1. Sidorov G., Velasquez F., Stamatatos E., Gelbukh A., Chanona-Hernández L. Syntactic N-grams as machine learning features for natural language processing. Expert Syst. Appl. 2013;41:853–860. doi: 10.1016/j.eswa.2013.08.015. - DOI
    1. Salton G., editor. Automatic Text Processing. Addison-Wesley Longman Publishing Co., Inc.; Boston, MA, USA: 1988.
    1. Ruiz Costa-jussà M. Segmentation strategies to face morphology challenges in Brazilian-Portuguese/English statistical machine translation and its integration in cross-language information retrieval. Comput. Sist. 2015;19:357–370. doi: 10.13053/cys-19-2-1550. - DOI

LinkOut - more resources