Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs

doi:10.3390/s16091374

. 2016 Aug 29;16(9):1374.

doi: 10.3390/s16091374.

Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs

Helena Gómez-Adorno¹, Grigori Sidorov², David Pinto³, Darnes Vilariño⁴, Alexander Gelbukh⁵

Affiliations

¹ Instituto Politécnico Nacional, Centro de Investigación en Computación, Av. Juan de Dios Bátiz S/N, Mexico City 07738, Mexico. helena.adorno@gmail.com.
² Instituto Politécnico Nacional, Centro de Investigación en Computación, Av. Juan de Dios Bátiz S/N, Mexico City 07738, Mexico. sidorov@cic.ipn.mx.
³ Benemérita Universidad Autónoma de Puebla, Facultad de Ciencias de la Computación, Av. San Claudio y 14 Sur, Puebla 72570, Mexico. dpinto@cs.buap.mx.
⁴ Benemérita Universidad Autónoma de Puebla, Facultad de Ciencias de la Computación, Av. San Claudio y 14 Sur, Puebla 72570, Mexico. darnes@cs.buap.mx.
⁵ Instituto Politécnico Nacional, Centro de Investigación en Computación, Av. Juan de Dios Bátiz S/N, Mexico City 07738, Mexico.

PMID: 27589740
PMCID: PMC5038652
DOI: 10.3390/s16091374

Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs

Helena Gómez-Adorno et al. Sensors (Basel). 2016.

. 2016 Aug 29;16(9):1374.

doi: 10.3390/s16091374.

Authors

Helena Gómez-Adorno¹, Grigori Sidorov², David Pinto³, Darnes Vilariño⁴, Alexander Gelbukh⁵

Affiliations

¹ Instituto Politécnico Nacional, Centro de Investigación en Computación, Av. Juan de Dios Bátiz S/N, Mexico City 07738, Mexico. helena.adorno@gmail.com.
² Instituto Politécnico Nacional, Centro de Investigación en Computación, Av. Juan de Dios Bátiz S/N, Mexico City 07738, Mexico. sidorov@cic.ipn.mx.
³ Benemérita Universidad Autónoma de Puebla, Facultad de Ciencias de la Computación, Av. San Claudio y 14 Sur, Puebla 72570, Mexico. dpinto@cs.buap.mx.
⁴ Benemérita Universidad Autónoma de Puebla, Facultad de Ciencias de la Computación, Av. San Claudio y 14 Sur, Puebla 72570, Mexico. darnes@cs.buap.mx.
⁵ Instituto Politécnico Nacional, Centro de Investigación en Computación, Av. Juan de Dios Bátiz S/N, Mexico City 07738, Mexico.

PMID: 27589740
PMCID: PMC5038652
DOI: 10.3390/s16091374

Abstract

We apply the integrated syntactic graph feature extraction methodology to the task of automatic authorship detection. This graph-based representation allows integrating different levels of language description into a single structure. We extract textual patterns based on features obtained from shortest path walks over integrated syntactic graphs and apply them to determine the authors of documents. On average, our method outperforms the state of the art approaches and gives consistently high results across different corpora, unlike existing methods. Our results show that our textual patterns are useful for the task of authorship attribution.

Keywords: authorship attribution; authorship verification; integrated syntactic graphs; shortest paths walks; syntactic n-grams; textual patterns.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
Dependency trees of three sentences of the target text using word_POS combination for the nodes and dependency labels for the edges.

**Figure 2**
The integrated syntactic graph for the three sentences shown in Figure 1.

**Figure 3**
Syntactic tree with the synonym expansion of a sentence: “Yes, here they come”.

**Figure 4**
Accuracy for each of the ten authors.

See this image and copyright information in PMC

Cited by

Deep neural network and model-based clustering technique for forensic electronic mail author attribution.
Apoorva KA, Sangeetha S. Apoorva KA, et al. SN Appl Sci. 2021;3(3):348. doi: 10.1007/s42452-020-04127-6. Epub 2021 Feb 18. SN Appl Sci. 2021. PMID: 33619463 Free PMC article.

References

1. Mihalcea R., Radev D. Graph-Based Natural Language Processing and Information Retrieval. MIT Press; Cambridge, NY, USA: 2011.
1. Pinto D., Gómez-Adorno H., Vilariño D., Singh V.K. A graph-based multi-level linguistic representation for document understanding. Pattern Recognit. Lett. 2014;41:93–102. doi: 10.1016/j.patrec.2013.12.004. - DOI
1. Sidorov G., Velasquez F., Stamatatos E., Gelbukh A., Chanona-Hernández L. Syntactic N-grams as machine learning features for natural language processing. Expert Syst. Appl. 2013;41:853–860. doi: 10.1016/j.eswa.2013.08.015. - DOI
1. Salton G., editor. Automatic Text Processing. Addison-Wesley Longman Publishing Co., Inc.; Boston, MA, USA: 1988.
1. Ruiz Costa-jussà M. Segmentation strategies to face morphology challenges in Brazilian-Portuguese/English statistical machine translation and its integration in cross-language information retrieval. Comput. Sist. 2015;19:357–370. doi: 10.13053/cys-19-2-1550. - DOI

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

[1] Mihalcea R., Radev D. Graph-Based Natural Language Processing and Information Retrieval. MIT Press; Cambridge, NY, USA: 2011.

[2] Mihalcea R., Radev D. Graph-Based Natural Language Processing and Information Retrieval. MIT Press; Cambridge, NY, USA: 2011.

[3] Pinto D., Gómez-Adorno H., Vilariño D., Singh V.K. A graph-based multi-level linguistic representation for document understanding. Pattern Recognit. Lett. 2014;41:93–102. doi: 10.1016/j.patrec.2013.12.004. - DOI

[4] Pinto D., Gómez-Adorno H., Vilariño D., Singh V.K. A graph-based multi-level linguistic representation for document understanding. Pattern Recognit. Lett. 2014;41:93–102. doi: 10.1016/j.patrec.2013.12.004. - DOI

[5] Sidorov G., Velasquez F., Stamatatos E., Gelbukh A., Chanona-Hernández L. Syntactic N-grams as machine learning features for natural language processing. Expert Syst. Appl. 2013;41:853–860. doi: 10.1016/j.eswa.2013.08.015. - DOI

[6] Sidorov G., Velasquez F., Stamatatos E., Gelbukh A., Chanona-Hernández L. Syntactic N-grams as machine learning features for natural language processing. Expert Syst. Appl. 2013;41:853–860. doi: 10.1016/j.eswa.2013.08.015. - DOI

[7] Salton G., editor. Automatic Text Processing. Addison-Wesley Longman Publishing Co., Inc.; Boston, MA, USA: 1988.

[8] Salton G., editor. Automatic Text Processing. Addison-Wesley Longman Publishing Co., Inc.; Boston, MA, USA: 1988.

[9] Ruiz Costa-jussà M. Segmentation strategies to face morphology challenges in Brazilian-Portuguese/English statistical machine translation and its integration in cross-language information retrieval. Comput. Sist. 2015;19:357–370. doi: 10.13053/cys-19-2-1550. - DOI

[10] Ruiz Costa-jussà M. Segmentation strategies to face morphology challenges in Brazilian-Portuguese/English statistical machine translation and its integration in cross-language information retrieval. Comput. Sist. 2015;19:357–370. doi: 10.13053/cys-19-2-1550. - DOI

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs

Affiliations

Automatic Authorship Detection Using Textual Patterns Extracted from Integrated Syntactic Graphs

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Other Literature Sources