An automated framework for the extraction of semantic legal metadata from legal texts

Sleimi, Amin; Sannier, Nicolas; Sabetzadeh, Mehrdad; Briand, Lionel; Ceci, Marcello; Dann, John

doi:10.1007/s10664-020-09933-5

An automated framework for the extraction of semantic legal metadata from legal texts

Published: 24 March 2021

Volume 26, article number 43, (2021)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Amin Sleimi ORCID: orcid.org/0000-0001-6698-2228¹,
Nicolas Sannier¹,
Mehrdad Sabetzadeh^1,2,
Lionel Briand^1,2,
Marcello Ceci¹ &
…
John Dann³

1101 Accesses
15 Citations
Explore all metrics

Abstract

Semantic legal metadata provides information that helps with understanding and interpreting legal provisions. Such metadata is therefore important for the systematic analysis of legal requirements. However, manually enhancing a large legal corpus with semantic metadata is prohibitively expensive. Our work is motivated by two observations: (1) the existing requirements engineering (RE) literature does not provide a harmonized view on the semantic metadata types that are useful for legal requirements analysis; (2) automated support for the extraction of semantic legal metadata is scarce, and it does not exploit the full potential of artificial intelligence technologies, notably natural language processing (NLP) and machine learning (ML). Our objective is to take steps toward overcoming these limitations. To do so, we review and reconcile the semantic legal metadata types proposed in the RE literature. Subsequently, we devise an automated extraction approach for the identified metadata types using NLP and ML. We evaluate our approach through two case studies over the Luxembourgish legislation. Our results indicate a high accuracy in the generation of metadata annotations. In particular, in the two case studies, we were able to obtain precision scores of 97,2% and 82,4%, and recall scores of 94,9% and 92,4%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Combining Natural Language Processing Approaches for Rule Extraction from Legal Documents

Automatic Extraction of Legal Norms: Evaluation of Natural Language Processing Tools

Populating legal ontologies using semantic role labeling

Article 24 June 2020

References

Arora C, Sabetzadeh M, Briand LC, Zimmer F (2015) Automated checking of conformance to requirements templates using natural language processing. IEEE Trans Softw Eng 41(10):944–968
Article Google Scholar
Athan T, Boley H, Governatori G, Palmirani M, Paschke A, Wyner AZ (2013) OASIS LegalRuleML. In: Proceedings of the International Conference on Artificial Intelligence and Law (ICAIL’13), pp 3–12
Bhatia J, Breaux TD, Schaub F (2016) Mining privacy goals from privacy policies using hybridized task recomposition. ACM Transactions on Software Engineering and Methodology 25(3):22:1–22:24
Article Google Scholar
Bhatia J, Evans MC, Wadkar S, Breaux TD (2016) Automated extraction of regulated information types using hyponymy relations. In: Proceedings of the 3rd International Workshop on Artificial Intelligence for Requirements Engineering (AIRE’16), pp 19–25
Boella G, Caro LD, Humphreys L, Robaldo L, Rossi P, van der Torre L (2016) Eunomos, a legal document and knowledge management system for the web to provide relevant, reliable and up-to-date information on the law. Artificial Intelligence and Law 24(3):245–283
Article Google Scholar
Boer A, Winkels R, Vitali F (2007) Proposed XML standards for law: Metalex and LKIF. In: Proceedings of the 20th Annual Conference on Legal Knowledge and Information Systems (JURIX’07), pp 19–28
Breaux T (2009) Legal requirements acquisition for the specification of legally compliant information systems. PhD thesis, North Carolina State University Raleigh, North Carolina, USA
Breaux TD, Antȯn AI (2008) Analyzing regulatory rules for privacy and security requirements. IEEE Trans Softw Eng 34(1):5–20
Article Google Scholar
Breaux TD, Vail MW, Antón AI (2006) Towards regulatory compliance: Extracting rights and obligations to align requirements with regulations. In: Proceedings of the 14th IEEE International Requirements Engineering Conference (RE’06), pp 46–55
Breuker J, Boer A, Hoekstra R, van den Berg K (2006) Developing content for LKIF: ontologies and frameworks for legal reasoning. In: Proceedings of the 19th Annual Conference on Legal Knowledge and Information Systems (JURIX’06), pp 169–174
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Article Google Scholar
Dell’Orletta F, Marchi S, Montemagni S, Plank B, Venturi G (2012) The splet–2012 shared task on dependency parsing of legal texts. In: the 4th Workshop on Semantic Processing of Legal Texts (SPLeT’12), pp 42–51
Elrakaiby Y, Ferrari A, Spoletini P, Gnesi S, Nuseibeh B (2017) Using argumentation to explain ambiguity in requirements elicitation interviews. In: Proceedings of the 25th IEEE International Requirements Engineering Conference (RE’17), pp 51–60
Evans MC, Bhatia J, Wadkar S, Breaux TD (2017) An evaluation of constituency-based hyponymy extraction from privacy policies. In: Proceedings of the 25th IEEE International Requirements Engineering Conference (RE’17), pp 312–321
Frank E, Hall MA, Witten IH (2016) The WEKA workbench. online appendix for “data mining: Practical machine learning tools and techniques”
Ghanavati S (2013) Legal-urn framework for legal compliance of business processes. PhD thesis, University of Ottawa Ottawa, Ontario, Canada
Ghanavati S, Amyot D, Rifaut A (2014) Legal goal-oriented requirement language (legal GRL) for modeling regulations. In: Proceedings of the 6th International Workshop on Modeling in Software Engineering (MISE’14), pp 1–6
Gildea D, Jurafsky D (2000) Automatic labeling of semantic roles. In: the 38th Annual Conference of the Association for Computational Linguistics (ACL-00), pp 512–520
Giorgini P, Massacci F, Mylopoulos J, Zannone N (2005) Modeling security requirements through ownership, permission and delegation. In: Proceedings of the 13th IEEE International Conference on Requirements Engineering (RE’05), pp 167–176
Gordon DG, Breaux TD (2012) Reconciling multi-jurisdictional legal requirements: A case study in requirements water marking. In: Proceedings of the 20th IEEE International Requirements Engineering Conference (RE’12), pp 91–100
Grossi D, Meyer JJC, Dignum F (2008) The many faces of counts-as: A formal analysis of constitutive rules. J App Logic 6(2):192–217. https://doi.org/10.1016/j.jal.2007.06.008, http://www.sciencedirect.com/science/article/pii/S1570868307000559, selected papers from the 8th International Workshop on Deontic Logic in Computer Science
Article MathSciNet Google Scholar
Hirschberg J, Manning CD (2015) Advances in natural language processing. Science 349(6245):261–266
Article MathSciNet Google Scholar
Hoekstra R, Breuker J, Bello MD, Boer A (2007) The LKIF core ontology of basic legal concepts. In: Proceedings of the 2nd Workshop on Legal Ontologies and Artificial Intelligence Techniques (LOAIT’07), pp 43–63
Hohfeld WN (1917) Fundamental legal conceptions as applied in judicial reasoning. The Yale Law Journal 26(8):710–770
Article Google Scholar
Horty JF (2001) Agency and deontic logic oxford scholarship online. Oxford University Press, Oxford
Book Google Scholar
Ingolfo S, Jureta I, Siena A, Perini A, Susi A (2014) Nòmos 3: Legal compliance of roles and requirements. In: Proceedings of the 33rd international conference on conceptual modeling (ER’14), pp 275–288
James G, Witten D, Hastie T, Tibshirani R (2014) An Introduction to Statistical Learning: With Applications in R
Jurafsky D, Martin JH (2000) Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition, 1st edn. Prentice Hall PTR, Upper Saddle River
Google Scholar
Kiyavitskaya N, Zeni N, Mich L, Cordy JR, Mylopoulos J (2006) Text mining through semi automatic semantic annotation. In: Proceedings of the 6th International Conference on Practical Aspects of Knowledge Management (PAKM’06), pp 143–154
Kiyavitskaya N, Zeni N, Breaux TD, Antón AI, Cordy JR, Mich L, Mylopoulos J (2008) Automating the extraction of rights and obligations for regulatory compliance. In: Proceedings of the 27th International Conference on Conceptual Modeling (ER’08), pp 154–168
Kummerfeld JK, Hall DLW, Curran JR, Klein D (2012) Parser showdown at the wall street corral: An empirical investigation of error types in parser output. In: Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL’12), pp 1048–1059
Lam H, Hashmi M, Scofield B (2016) Enabling reasoning with LegalRuleML. In: Proceedings of the 10th international symposium on rule technologies. Research, Tools, and Applications (RuleML’16), pp 241–257
Landis J, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Article Google Scholar
Levy R, Andrew G (2006) Tregex and tsurgeon: tools for querying and manipulating tree data structures. In: Proceedings of the 5th international conference on language resources and evaluation (LREC’06), pp 2231–2234
Lucassen G, Robeer M, Dalpiaz F, van der Werf JMEM, Brinkkemper S (2017) Extracting conceptual models from user stories with visual narrator. Requir Eng 22(3):339–358
Article Google Scholar
Massey A (2012) Legal requirements metrics for compliance analysis. PhD thesis, North Carolina State University Raleigh, North Carolina, USA
Massey AK, Otto PN, Hayward LJ, Antȯn A I (2010) Evaluating existing security and privacy requirements for legal compliance. Requir Eng 15 (1):119–137
Article Google Scholar
Maxwell JC, Antón AI (2010) The production rule framework: developing a canonical set of software requirements for compliance with law. In: Proceedings of the ACM international health informatics symposium (IHI’10), pp 629–636
Maxwell JC, Antȯn AI, Swire PP, Riaz M, McCraw CM (2012) A legal cross-references taxonomy for reasoning about compliance requirements. Requir Eng 17(2):99–115
Article Google Scholar
McDonald RT, Nivre J (2007) Characterizing the errors of data-driven dependency parsing models. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL’07), pp 122–131
Nivre J, Hall J, Nilsson J, Chanev A, Eryigit G, Ku̇bler S, Marinov S, Marsi E (2007) Maltparser: a language-independent system for data-driven dependency parsing. Nat Lang Eng 13(2):95–135
Article Google Scholar
Peters W, Sagri M, Tiscornia D (2007) The structuring of legal knowledge in LOIS. Artificial Intelligence and Law 15(2):117–135
Article Google Scholar
Petrov S, Barrett L, Thibaux R, Klein D (2006) Learning accurate, compact, and interpretable tree annotation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (ACL’06)
Pradet Q, Danlos L, de Chalendar G (2014) Adapting verbnet to french using existing resources. In: The ninth international conference on language resources and evaluation (LREC’14), pp 1122–1126
Princeton University (2010) About WordNet. http://wordnet.princeton.edu
Quirchmayr T, Paech B, Kohl R, Karey H, Kasdepke G (2018) Semi-automatic rule-based domain terminology and software feature-relevant information extraction from natural language user manuals. Empirical Software Engineering 23(6):3630–3683
Article Google Scholar
Rosadini B, Ferrari A, Gori G, Fantechi A, Gnesi S, Trotta I, Bacherini S (2017) Using NLP to detect requirements defects: An industrial experience in the railway domain. In: Proceedings of the 23rd International Working Conference on Requirements Engineering: Foundation for Software Quality (REFSQ’17), pp 344–360
RuleML (2015) Specification of RuleML 1.02. http://wiki.ruleml.org/index.php/Specification_of_RuleML_1.02/
Sagot B (2010) The Lefff, a freely available and large-coverage morphological and syntactic lexicon for french. In: Proceedings of the international conference on language resources and evaluation(LREC’10), pp 2745–2751
Saldaña J (2015) The Coding Manual for Qualitative Researchers. Sage
Sannier N, Adedjouma M, Sabetzadeh M, Briand LC (2016) Automated classification of legal cross references based on semantic intent. In: Proceedings of the 22nd international working conference on requirements engineering: foundation for software quality (REFSQ’16), pp 119–134
Sannier N, Adedjouma M, Sabetzadeh M, Briand LC (2017) An automated framework for detection and resolution of cross references in legal texts. Requir Eng 22(2):215–237
Article Google Scholar
Sartor G, Casanovas P, Biasiotti M, Fernndez-Barrera M (2013) Approaches to legal ontologies: Theories, Domains, Methodologies. Springer, Berlin
Google Scholar
Siena A, Mylopoulos J, Perini A, Susi A (2009) Designing law-compliant software requirements. In: Proceedings of the 28th international conference on conceptual modeling (ER’09), pp 472–486
Siena A, Jureta I, Ingolfo S, Susi A, Perini A, Mylopoulos J (2012) Capturing variability of law with nómos 2. In: Proceedings of the 31st international conference on conceptual modeling (ER’12), pp 383–396
Sleimi A, Sannier N, Sabetzadeh M, Briand LC, Dann J (2018) Automated extraction of semantic legal metadata using natural language processing. In: Proceedings of the 26th IEEE international requirements engineering conference (RE’18), pp 302–311
Sleimi A, Ceci M, Sannier N, Sabetzadeh M, Briand LC, Dann J (2019) A query system for extracting requirements-related information from legal texts. In: Proceedings of the 27th IEEE international requirements engineering conference (RE’19)
Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In: The 19th ACM SIGKDD international conference on knowledge discovery and data mining KDD, pp 847–855
Wiki (2004) Wiktionnaire. https://fr.wiktionary.org/
Zeni N, Kiyavitskaya N, Mich L, Cordy JR, Mylopoulos J (2015) Gaiust: supporting the extraction of rights and obligations for regulatory compliance. Requir Eng 20(1):1–22
Article Google Scholar
Zeni N, Seid EA, Engiel P, Ingolfo S, Mylopoulos J (2016) Building large models of law with NómosT. In: Proceedings of the 35th international conference on conceptual modeling (ER’16), pp 233–247

Download references

Acknowledgments

Supported by the Luxembourg National Research Fund (FNR) under grants PUBLIC2-17/IS/11801776 and PoC16/11554296, and by NSERC of Canada under the Discovery, Discovery Accelerator and CRC programs.

Author information

Authors and Affiliations

SnT, University of Luxembourg, Luxembourg City, Luxembourg
Amin Sleimi, Nicolas Sannier, Mehrdad Sabetzadeh, Lionel Briand & Marcello Ceci
School of EECS, University of Ottawa, Ottawa, Canada
Mehrdad Sabetzadeh & Lionel Briand
Central Legislative Service (SCL), Government of Luxembourg, Luxembourg City, Luxembourg
John Dann

Authors

Amin Sleimi
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Sannier
View author publications
You can also search for this author in PubMed Google Scholar
Mehrdad Sabetzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Lionel Briand
View author publications
You can also search for this author in PubMed Google Scholar
Marcello Ceci
View author publications
You can also search for this author in PubMed Google Scholar
John Dann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amin Sleimi.

Additional information

Communicated by: Federica Sarro

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: List of modal verbs expressing obligation, permission and prohibition

sont soumis
seront soumis
sera soumis
est soumis
est soumise
sont soumises
il doit
elle doit
ils doivent
elles doivent
il devra
elle devra
ils devront
elles devront
il oblige
elle oblige
ils obligent
elles obligent
il obligera
elle obligera
ils obligeront
elles obligeront
obligent
soumis
soumise
doit
devra
doivent
devront
obligation
obligé
devoir
est tenue
est tenu
il est obligé
elle est obligeé
est obligée
obligé
il est nécessaire
nécessaire
toujours
exigence
astreint
astreinte
astreint
vassuré
assurée
oblige
requis
requise
Permission
susceptible
droit
facultative
ont
elles peuvent
ils pourront
il peut
elle peut
pourront
permission
peut
vpeuvent
pouvoir
pourront
pourra
pouvant
permis
permise
est facultative
autorise
autorisé
autorisée
puisse
autorisation
possible
ont droit
a droit
aura droit
sont en droit
est en droit
sera en droit
seront en droit
sont susceptibles
est susceptible
sera susceptible
seront susceptibles
Prohibition
interdit
ne doit
n’ est pas en droit
est interdite
il est interdit
est interdit
ne peut
ne pourra
ne peuvent
ne sont pas autorisé
est prohibé
est prohibée
prohibé
interdiction
est illégal
est réprouvé
est réprouvée
proscrit
proscrite
est illicite

Appendix B: Results of ML experimentation for actor classification

The Auto-WEKA experiment for the actor dataset was run for different ML techniques and utilized hyper-parameter optimization. In other words, all hyper-parameters were automatically optimized for all ML techniques following the same automated procedure and therefore all classifiers were compared in a fair manner. The four top configurations are listed below:

1.
RandomForest -bagSizePercent 100 -numIterations 73 -numExecutionSlots 1 -numFeatures 0 -minimumVarianceForSplit 0.001 -seed 73
2.
NaiveBayes -batch-size 10
3.
LibSVM -typeOfSVM C-SVC -kernelFunction radial basis function -degreeForKernelFunction 3 -GammaForKernelFunction 0.0 -coefficientInKernelFunction 0.0 -parameterNuOf nu-SVC 0.5 -CacheMemory 40.0 -ParameterCOfnu-SVC 1.0 -toleranceOfTerminationCriterion 0.001 -epsilonInLossFunction 0.1 -seed 73
4.
Decision Tree J48 -confidenceFactor 0.15 -minimumInstance 2

We present the accuracy results of each of these configurations of ML techniques for the actor classification task in Table 11.

Table 11 Results for ML-based actor classification

Full size table

The Weka documentation (Frank et al. 2016) defines the hyper-parameters for the above algorithms as follows:

1.
seed – The random number seed to be used.
2.
bagSizePercent – Size of each bag, as a percentage of the training set size.
3.
numIterations – The number of iterations to be performed.
4.
numExecutionSlots – The number of execution slots (threads) to use for constructing the ensemble.
5.
numFeatures – Sets the number of randomly chosen attributes. If 0, int(log_2(numbor of predictors) + 1) is used.
6.
minimumVarianceForSplit – the minimum numeric class variance proportion of train variance for split
7.
batchSize – The preferred number of instances to process if batch prediction is being performed. More or fewer instances may be provided, but this gives implementations a chance to specify a preferred batch size.
8.
confidenceFactor – the confidence factor used for pruning (smaller values incur more pruning)
9.
minimumInstance – the minimum number of instances per leaf
10.
typeOfSVM – the type of SVM to use
11.
kernelFunction – the type of kernel to use
12.
degreeForKernelFunction – The degree of the kernel
13.
GammaForKernelFunction – The gamma to use for the kernel function
14.
coefficientInKernelFunction – The coefficient to use for the kernel function
15.
parameterNuOf nu-SVC – The value of nu coefficient for nu-SVC
16.
CacheMemory – the cache size in MB
17.
ParameterCOfnu-SVC – the parameter in the coefficient nu
18.
toleranceOfTerminationCriterion – the tolerance of termination criterion
19.
epsilonInLossFunction – The epsilon for the loss function in C-SVC

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sleimi, A., Sannier, N., Sabetzadeh, M. et al. An automated framework for the extraction of semantic legal metadata from legal texts. Empir Software Eng 26, 43 (2021). https://doi.org/10.1007/s10664-020-09933-5

Download citation

Accepted: 23 December 2020
Published: 24 March 2021
DOI: https://doi.org/10.1007/s10664-020-09933-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

An automated framework for the extraction of semantic legal metadata from legal texts

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Combining Natural Language Processing Approaches for Rule Extraction from Legal Documents

Automatic Extraction of Legal Norms: Evaluation of Natural Language Processing Tools

Populating legal ontologies using semantic role labeling

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix A: List of modal verbs expressing obligation, permission and prohibition

Appendix B: Results of ML experimentation for actor classification

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An automated framework for the extraction of semantic legal metadata from legal texts

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Combining Natural Language Processing Approaches for Rule Extraction from Legal Documents

Automatic Extraction of Legal Norms: Evaluation of Natural Language Processing Tools

Populating legal ontologies using semantic role labeling

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix A: List of modal verbs expressing obligation, permission and prohibition

Appendix B: Results of ML experimentation for actor classification

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation