Abstract
Semantic legal metadata provides information that helps with understanding and interpreting legal provisions. Such metadata is therefore important for the systematic analysis of legal requirements. However, manually enhancing a large legal corpus with semantic metadata is prohibitively expensive. Our work is motivated by two observations: (1) the existing requirements engineering (RE) literature does not provide a harmonized view on the semantic metadata types that are useful for legal requirements analysis; (2) automated support for the extraction of semantic legal metadata is scarce, and it does not exploit the full potential of artificial intelligence technologies, notably natural language processing (NLP) and machine learning (ML). Our objective is to take steps toward overcoming these limitations. To do so, we review and reconcile the semantic legal metadata types proposed in the RE literature. Subsequently, we devise an automated extraction approach for the identified metadata types using NLP and ML. We evaluate our approach through two case studies over the Luxembourgish legislation. Our results indicate a high accuracy in the generation of metadata annotations. In particular, in the two case studies, we were able to obtain precision scores of 97,2% and 82,4%, and recall scores of 94,9% and 92,4%.
Similar content being viewed by others
References
Arora C, Sabetzadeh M, Briand LC, Zimmer F (2015) Automated checking of conformance to requirements templates using natural language processing. IEEE Trans Softw Eng 41(10):944–968
Athan T, Boley H, Governatori G, Palmirani M, Paschke A, Wyner AZ (2013) OASIS LegalRuleML. In: Proceedings of the International Conference on Artificial Intelligence and Law (ICAIL’13), pp 3–12
Bhatia J, Breaux TD, Schaub F (2016) Mining privacy goals from privacy policies using hybridized task recomposition. ACM Transactions on Software Engineering and Methodology 25(3):22:1–22:24
Bhatia J, Evans MC, Wadkar S, Breaux TD (2016) Automated extraction of regulated information types using hyponymy relations. In: Proceedings of the 3rd International Workshop on Artificial Intelligence for Requirements Engineering (AIRE’16), pp 19–25
Boella G, Caro LD, Humphreys L, Robaldo L, Rossi P, van der Torre L (2016) Eunomos, a legal document and knowledge management system for the web to provide relevant, reliable and up-to-date information on the law. Artificial Intelligence and Law 24(3):245–283
Boer A, Winkels R, Vitali F (2007) Proposed XML standards for law: Metalex and LKIF. In: Proceedings of the 20th Annual Conference on Legal Knowledge and Information Systems (JURIX’07), pp 19–28
Breaux T (2009) Legal requirements acquisition for the specification of legally compliant information systems. PhD thesis, North Carolina State University Raleigh, North Carolina, USA
Breaux TD, Antȯn AI (2008) Analyzing regulatory rules for privacy and security requirements. IEEE Trans Softw Eng 34(1):5–20
Breaux TD, Vail MW, Antón AI (2006) Towards regulatory compliance: Extracting rights and obligations to align requirements with regulations. In: Proceedings of the 14th IEEE International Requirements Engineering Conference (RE’06), pp 46–55
Breuker J, Boer A, Hoekstra R, van den Berg K (2006) Developing content for LKIF: ontologies and frameworks for legal reasoning. In: Proceedings of the 19th Annual Conference on Legal Knowledge and Information Systems (JURIX’06), pp 169–174
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Dell’Orletta F, Marchi S, Montemagni S, Plank B, Venturi G (2012) The splet–2012 shared task on dependency parsing of legal texts. In: the 4th Workshop on Semantic Processing of Legal Texts (SPLeT’12), pp 42–51
Elrakaiby Y, Ferrari A, Spoletini P, Gnesi S, Nuseibeh B (2017) Using argumentation to explain ambiguity in requirements elicitation interviews. In: Proceedings of the 25th IEEE International Requirements Engineering Conference (RE’17), pp 51–60
Evans MC, Bhatia J, Wadkar S, Breaux TD (2017) An evaluation of constituency-based hyponymy extraction from privacy policies. In: Proceedings of the 25th IEEE International Requirements Engineering Conference (RE’17), pp 312–321
Frank E, Hall MA, Witten IH (2016) The WEKA workbench. online appendix for “data mining: Practical machine learning tools and techniques”
Ghanavati S (2013) Legal-urn framework for legal compliance of business processes. PhD thesis, University of Ottawa Ottawa, Ontario, Canada
Ghanavati S, Amyot D, Rifaut A (2014) Legal goal-oriented requirement language (legal GRL) for modeling regulations. In: Proceedings of the 6th International Workshop on Modeling in Software Engineering (MISE’14), pp 1–6
Gildea D, Jurafsky D (2000) Automatic labeling of semantic roles. In: the 38th Annual Conference of the Association for Computational Linguistics (ACL-00), pp 512–520
Giorgini P, Massacci F, Mylopoulos J, Zannone N (2005) Modeling security requirements through ownership, permission and delegation. In: Proceedings of the 13th IEEE International Conference on Requirements Engineering (RE’05), pp 167–176
Gordon DG, Breaux TD (2012) Reconciling multi-jurisdictional legal requirements: A case study in requirements water marking. In: Proceedings of the 20th IEEE International Requirements Engineering Conference (RE’12), pp 91–100
Grossi D, Meyer JJC, Dignum F (2008) The many faces of counts-as: A formal analysis of constitutive rules. J App Logic 6(2):192–217. https://doi.org/10.1016/j.jal.2007.06.008, http://www.sciencedirect.com/science/article/pii/S1570868307000559, selected papers from the 8th International Workshop on Deontic Logic in Computer Science
Hirschberg J, Manning CD (2015) Advances in natural language processing. Science 349(6245):261–266
Hoekstra R, Breuker J, Bello MD, Boer A (2007) The LKIF core ontology of basic legal concepts. In: Proceedings of the 2nd Workshop on Legal Ontologies and Artificial Intelligence Techniques (LOAIT’07), pp 43–63
Hohfeld WN (1917) Fundamental legal conceptions as applied in judicial reasoning. The Yale Law Journal 26(8):710–770
Horty JF (2001) Agency and deontic logic oxford scholarship online. Oxford University Press, Oxford
Ingolfo S, Jureta I, Siena A, Perini A, Susi A (2014) Nòmos 3: Legal compliance of roles and requirements. In: Proceedings of the 33rd international conference on conceptual modeling (ER’14), pp 275–288
James G, Witten D, Hastie T, Tibshirani R (2014) An Introduction to Statistical Learning: With Applications in R
Jurafsky D, Martin JH (2000) Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition, 1st edn. Prentice Hall PTR, Upper Saddle River
Kiyavitskaya N, Zeni N, Mich L, Cordy JR, Mylopoulos J (2006) Text mining through semi automatic semantic annotation. In: Proceedings of the 6th International Conference on Practical Aspects of Knowledge Management (PAKM’06), pp 143–154
Kiyavitskaya N, Zeni N, Breaux TD, Antón AI, Cordy JR, Mich L, Mylopoulos J (2008) Automating the extraction of rights and obligations for regulatory compliance. In: Proceedings of the 27th International Conference on Conceptual Modeling (ER’08), pp 154–168
Kummerfeld JK, Hall DLW, Curran JR, Klein D (2012) Parser showdown at the wall street corral: An empirical investigation of error types in parser output. In: Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL’12), pp 1048–1059
Lam H, Hashmi M, Scofield B (2016) Enabling reasoning with LegalRuleML. In: Proceedings of the 10th international symposium on rule technologies. Research, Tools, and Applications (RuleML’16), pp 241–257
Landis J, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Levy R, Andrew G (2006) Tregex and tsurgeon: tools for querying and manipulating tree data structures. In: Proceedings of the 5th international conference on language resources and evaluation (LREC’06), pp 2231–2234
Lucassen G, Robeer M, Dalpiaz F, van der Werf JMEM, Brinkkemper S (2017) Extracting conceptual models from user stories with visual narrator. Requir Eng 22(3):339–358
Massey A (2012) Legal requirements metrics for compliance analysis. PhD thesis, North Carolina State University Raleigh, North Carolina, USA
Massey AK, Otto PN, Hayward LJ, Antȯn A I (2010) Evaluating existing security and privacy requirements for legal compliance. Requir Eng 15 (1):119–137
Maxwell JC, Antón AI (2010) The production rule framework: developing a canonical set of software requirements for compliance with law. In: Proceedings of the ACM international health informatics symposium (IHI’10), pp 629–636
Maxwell JC, Antȯn AI, Swire PP, Riaz M, McCraw CM (2012) A legal cross-references taxonomy for reasoning about compliance requirements. Requir Eng 17(2):99–115
McDonald RT, Nivre J (2007) Characterizing the errors of data-driven dependency parsing models. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL’07), pp 122–131
Nivre J, Hall J, Nilsson J, Chanev A, Eryigit G, Ku̇bler S, Marinov S, Marsi E (2007) Maltparser: a language-independent system for data-driven dependency parsing. Nat Lang Eng 13(2):95–135
Peters W, Sagri M, Tiscornia D (2007) The structuring of legal knowledge in LOIS. Artificial Intelligence and Law 15(2):117–135
Petrov S, Barrett L, Thibaux R, Klein D (2006) Learning accurate, compact, and interpretable tree annotation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (ACL’06)
Pradet Q, Danlos L, de Chalendar G (2014) Adapting verbnet to french using existing resources. In: The ninth international conference on language resources and evaluation (LREC’14), pp 1122–1126
Princeton University (2010) About WordNet. http://wordnet.princeton.edu
Quirchmayr T, Paech B, Kohl R, Karey H, Kasdepke G (2018) Semi-automatic rule-based domain terminology and software feature-relevant information extraction from natural language user manuals. Empirical Software Engineering 23(6):3630–3683
Rosadini B, Ferrari A, Gori G, Fantechi A, Gnesi S, Trotta I, Bacherini S (2017) Using NLP to detect requirements defects: An industrial experience in the railway domain. In: Proceedings of the 23rd International Working Conference on Requirements Engineering: Foundation for Software Quality (REFSQ’17), pp 344–360
RuleML (2015) Specification of RuleML 1.02. http://wiki.ruleml.org/index.php/Specification_of_RuleML_1.02/
Sagot B (2010) The Lefff, a freely available and large-coverage morphological and syntactic lexicon for french. In: Proceedings of the international conference on language resources and evaluation(LREC’10), pp 2745–2751
Saldaña J (2015) The Coding Manual for Qualitative Researchers. Sage
Sannier N, Adedjouma M, Sabetzadeh M, Briand LC (2016) Automated classification of legal cross references based on semantic intent. In: Proceedings of the 22nd international working conference on requirements engineering: foundation for software quality (REFSQ’16), pp 119–134
Sannier N, Adedjouma M, Sabetzadeh M, Briand LC (2017) An automated framework for detection and resolution of cross references in legal texts. Requir Eng 22(2):215–237
Sartor G, Casanovas P, Biasiotti M, Fernndez-Barrera M (2013) Approaches to legal ontologies: Theories, Domains, Methodologies. Springer, Berlin
Siena A, Mylopoulos J, Perini A, Susi A (2009) Designing law-compliant software requirements. In: Proceedings of the 28th international conference on conceptual modeling (ER’09), pp 472–486
Siena A, Jureta I, Ingolfo S, Susi A, Perini A, Mylopoulos J (2012) Capturing variability of law with nómos 2. In: Proceedings of the 31st international conference on conceptual modeling (ER’12), pp 383–396
Sleimi A, Sannier N, Sabetzadeh M, Briand LC, Dann J (2018) Automated extraction of semantic legal metadata using natural language processing. In: Proceedings of the 26th IEEE international requirements engineering conference (RE’18), pp 302–311
Sleimi A, Ceci M, Sannier N, Sabetzadeh M, Briand LC, Dann J (2019) A query system for extracting requirements-related information from legal texts. In: Proceedings of the 27th IEEE international requirements engineering conference (RE’19)
Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In: The 19th ACM SIGKDD international conference on knowledge discovery and data mining KDD, pp 847–855
Wiki (2004) Wiktionnaire. https://fr.wiktionary.org/
Zeni N, Kiyavitskaya N, Mich L, Cordy JR, Mylopoulos J (2015) Gaiust: supporting the extraction of rights and obligations for regulatory compliance. Requir Eng 20(1):1–22
Zeni N, Seid EA, Engiel P, Ingolfo S, Mylopoulos J (2016) Building large models of law with NómosT. In: Proceedings of the 35th international conference on conceptual modeling (ER’16), pp 233–247
Acknowledgments
Supported by the Luxembourg National Research Fund (FNR) under grants PUBLIC2-17/IS/11801776 and PoC16/11554296, and by NSERC of Canada under the Discovery, Discovery Accelerator and CRC programs.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Federica Sarro
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: List of modal verbs expressing obligation, permission and prohibition
-
sont soumis
-
seront soumis
-
sera soumis
-
est soumis
-
est soumise
-
sont soumises
-
il doit
-
elle doit
-
ils doivent
-
elles doivent
-
il devra
-
elle devra
-
ils devront
-
elles devront
-
il oblige
-
elle oblige
-
ils obligent
-
elles obligent
-
il obligera
-
elle obligera
-
ils obligeront
-
elles obligeront
-
obligent
-
soumis
-
soumise
-
doit
-
devra
-
doivent
-
devront
-
obligation
-
obligé
-
devoir
-
est tenue
-
est tenu
-
il est obligé
-
elle est obligeé
-
est obligée
-
obligé
-
il est nécessaire
-
nécessaire
-
toujours
-
exigence
-
astreint
-
astreinte
-
astreint
-
vassuré
-
assurée
-
oblige
-
requis
-
requise
-
Permission
-
susceptible
-
droit
-
facultative
-
ont
-
elles peuvent
-
ils pourront
-
il peut
-
elle peut
-
pourront
-
permission
-
peut
-
vpeuvent
-
pouvoir
-
pourront
-
pourra
-
pouvant
-
permis
-
permise
-
est facultative
-
autorise
-
autorisé
-
autorisée
-
puisse
-
autorisation
-
possible
-
ont droit
-
a droit
-
aura droit
-
sont en droit
-
est en droit
-
sera en droit
-
seront en droit
-
sont susceptibles
-
est susceptible
-
sera susceptible
-
seront susceptibles
-
Prohibition
-
interdit
-
ne doit
-
n’ est pas en droit
-
est interdite
-
il est interdit
-
est interdit
-
ne peut
-
ne pourra
-
ne peuvent
-
ne sont pas autorisé
-
est prohibé
-
est prohibée
-
prohibé
-
interdiction
-
est illégal
-
est réprouvé
-
est réprouvée
-
proscrit
-
proscrite
-
est illicite
Appendix B: Results of ML experimentation for actor classification
The Auto-WEKA experiment for the actor dataset was run for different ML techniques and utilized hyper-parameter optimization. In other words, all hyper-parameters were automatically optimized for all ML techniques following the same automated procedure and therefore all classifiers were compared in a fair manner. The four top configurations are listed below:
-
1.
RandomForest -bagSizePercent 100 -numIterations 73 -numExecutionSlots 1 -numFeatures 0 -minimumVarianceForSplit 0.001 -seed 73
-
2.
NaiveBayes -batch-size 10
-
3.
LibSVM -typeOfSVM C-SVC -kernelFunction radial basis function -degreeForKernelFunction 3 -GammaForKernelFunction 0.0 -coefficientInKernelFunction 0.0 -parameterNuOf nu-SVC 0.5 -CacheMemory 40.0 -ParameterCOfnu-SVC 1.0 -toleranceOfTerminationCriterion 0.001 -epsilonInLossFunction 0.1 -seed 73
-
4.
Decision Tree J48 -confidenceFactor 0.15 -minimumInstance 2
We present the accuracy results of each of these configurations of ML techniques for the actor classification task in Table 11.
The Weka documentation (Frank et al. 2016) defines the hyper-parameters for the above algorithms as follows:
-
1.
seed – The random number seed to be used.
-
2.
bagSizePercent – Size of each bag, as a percentage of the training set size.
-
3.
numIterations – The number of iterations to be performed.
-
4.
numExecutionSlots – The number of execution slots (threads) to use for constructing the ensemble.
-
5.
numFeatures – Sets the number of randomly chosen attributes. If 0, int(log_2(numbor of predictors) + 1) is used.
-
6.
minimumVarianceForSplit – the minimum numeric class variance proportion of train variance for split
-
7.
batchSize – The preferred number of instances to process if batch prediction is being performed. More or fewer instances may be provided, but this gives implementations a chance to specify a preferred batch size.
-
8.
confidenceFactor – the confidence factor used for pruning (smaller values incur more pruning)
-
9.
minimumInstance – the minimum number of instances per leaf
-
10.
typeOfSVM – the type of SVM to use
-
11.
kernelFunction – the type of kernel to use
-
12.
degreeForKernelFunction – The degree of the kernel
-
13.
GammaForKernelFunction – The gamma to use for the kernel function
-
14.
coefficientInKernelFunction – The coefficient to use for the kernel function
-
15.
parameterNuOf nu-SVC – The value of nu coefficient for nu-SVC
-
16.
CacheMemory – the cache size in MB
-
17.
ParameterCOfnu-SVC – the parameter in the coefficient nu
-
18.
toleranceOfTerminationCriterion – the tolerance of termination criterion
-
19.
epsilonInLossFunction – The epsilon for the loss function in C-SVC
Rights and permissions
About this article
Cite this article
Sleimi, A., Sannier, N., Sabetzadeh, M. et al. An automated framework for the extraction of semantic legal metadata from legal texts. Empir Software Eng 26, 43 (2021). https://doi.org/10.1007/s10664-020-09933-5
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-020-09933-5