An automated framework for the extraction of semantic legal metadata from legal texts | Empirical Software Engineering Skip to main content
Log in

An automated framework for the extraction of semantic legal metadata from legal texts

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Semantic legal metadata provides information that helps with understanding and interpreting legal provisions. Such metadata is therefore important for the systematic analysis of legal requirements. However, manually enhancing a large legal corpus with semantic metadata is prohibitively expensive. Our work is motivated by two observations: (1) the existing requirements engineering (RE) literature does not provide a harmonized view on the semantic metadata types that are useful for legal requirements analysis; (2) automated support for the extraction of semantic legal metadata is scarce, and it does not exploit the full potential of artificial intelligence technologies, notably natural language processing (NLP) and machine learning (ML). Our objective is to take steps toward overcoming these limitations. To do so, we review and reconcile the semantic legal metadata types proposed in the RE literature. Subsequently, we devise an automated extraction approach for the identified metadata types using NLP and ML. We evaluate our approach through two case studies over the Luxembourgish legislation. Our results indicate a high accuracy in the generation of metadata annotations. In particular, in the two case studies, we were able to obtain precision scores of 97,2% and 82,4%, and recall scores of 94,9% and 92,4%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Arora C, Sabetzadeh M, Briand LC, Zimmer F (2015) Automated checking of conformance to requirements templates using natural language processing. IEEE Trans Softw Eng 41(10):944–968

    Article  Google Scholar 

  • Athan T, Boley H, Governatori G, Palmirani M, Paschke A, Wyner AZ (2013) OASIS LegalRuleML. In: Proceedings of the International Conference on Artificial Intelligence and Law (ICAIL’13), pp 3–12

  • Bhatia J, Breaux TD, Schaub F (2016) Mining privacy goals from privacy policies using hybridized task recomposition. ACM Transactions on Software Engineering and Methodology 25(3):22:1–22:24

    Article  Google Scholar 

  • Bhatia J, Evans MC, Wadkar S, Breaux TD (2016) Automated extraction of regulated information types using hyponymy relations. In: Proceedings of the 3rd International Workshop on Artificial Intelligence for Requirements Engineering (AIRE’16), pp 19–25

  • Boella G, Caro LD, Humphreys L, Robaldo L, Rossi P, van der Torre L (2016) Eunomos, a legal document and knowledge management system for the web to provide relevant, reliable and up-to-date information on the law. Artificial Intelligence and Law 24(3):245–283

    Article  Google Scholar 

  • Boer A, Winkels R, Vitali F (2007) Proposed XML standards for law: Metalex and LKIF. In: Proceedings of the 20th Annual Conference on Legal Knowledge and Information Systems (JURIX’07), pp 19–28

  • Breaux T (2009) Legal requirements acquisition for the specification of legally compliant information systems. PhD thesis, North Carolina State University Raleigh, North Carolina, USA

  • Breaux TD, Antȯn AI (2008) Analyzing regulatory rules for privacy and security requirements. IEEE Trans Softw Eng 34(1):5–20

    Article  Google Scholar 

  • Breaux TD, Vail MW, Antón AI (2006) Towards regulatory compliance: Extracting rights and obligations to align requirements with regulations. In: Proceedings of the 14th IEEE International Requirements Engineering Conference (RE’06), pp 46–55

  • Breuker J, Boer A, Hoekstra R, van den Berg K (2006) Developing content for LKIF: ontologies and frameworks for legal reasoning. In: Proceedings of the 19th Annual Conference on Legal Knowledge and Information Systems (JURIX’06), pp 169–174

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

    Article  Google Scholar 

  • Dell’Orletta F, Marchi S, Montemagni S, Plank B, Venturi G (2012) The splet–2012 shared task on dependency parsing of legal texts. In: the 4th Workshop on Semantic Processing of Legal Texts (SPLeT’12), pp 42–51

  • Elrakaiby Y, Ferrari A, Spoletini P, Gnesi S, Nuseibeh B (2017) Using argumentation to explain ambiguity in requirements elicitation interviews. In: Proceedings of the 25th IEEE International Requirements Engineering Conference (RE’17), pp 51–60

  • Evans MC, Bhatia J, Wadkar S, Breaux TD (2017) An evaluation of constituency-based hyponymy extraction from privacy policies. In: Proceedings of the 25th IEEE International Requirements Engineering Conference (RE’17), pp 312–321

  • Frank E, Hall MA, Witten IH (2016) The WEKA workbench. online appendix for “data mining: Practical machine learning tools and techniques”

  • Ghanavati S (2013) Legal-urn framework for legal compliance of business processes. PhD thesis, University of Ottawa Ottawa, Ontario, Canada

  • Ghanavati S, Amyot D, Rifaut A (2014) Legal goal-oriented requirement language (legal GRL) for modeling regulations. In: Proceedings of the 6th International Workshop on Modeling in Software Engineering (MISE’14), pp 1–6

  • Gildea D, Jurafsky D (2000) Automatic labeling of semantic roles. In: the 38th Annual Conference of the Association for Computational Linguistics (ACL-00), pp 512–520

  • Giorgini P, Massacci F, Mylopoulos J, Zannone N (2005) Modeling security requirements through ownership, permission and delegation. In: Proceedings of the 13th IEEE International Conference on Requirements Engineering (RE’05), pp 167–176

  • Gordon DG, Breaux TD (2012) Reconciling multi-jurisdictional legal requirements: A case study in requirements water marking. In: Proceedings of the 20th IEEE International Requirements Engineering Conference (RE’12), pp 91–100

  • Grossi D, Meyer JJC, Dignum F (2008) The many faces of counts-as: A formal analysis of constitutive rules. J App Logic 6(2):192–217. https://doi.org/10.1016/j.jal.2007.06.008, http://www.sciencedirect.com/science/article/pii/S1570868307000559, selected papers from the 8th International Workshop on Deontic Logic in Computer Science

    Article  MathSciNet  Google Scholar 

  • Hirschberg J, Manning CD (2015) Advances in natural language processing. Science 349(6245):261–266

    Article  MathSciNet  Google Scholar 

  • Hoekstra R, Breuker J, Bello MD, Boer A (2007) The LKIF core ontology of basic legal concepts. In: Proceedings of the 2nd Workshop on Legal Ontologies and Artificial Intelligence Techniques (LOAIT’07), pp 43–63

  • Hohfeld WN (1917) Fundamental legal conceptions as applied in judicial reasoning. The Yale Law Journal 26(8):710–770

    Article  Google Scholar 

  • Horty JF (2001) Agency and deontic logic oxford scholarship online. Oxford University Press, Oxford

    Book  Google Scholar 

  • Ingolfo S, Jureta I, Siena A, Perini A, Susi A (2014) Nòmos 3: Legal compliance of roles and requirements. In: Proceedings of the 33rd international conference on conceptual modeling (ER’14), pp 275–288

  • James G, Witten D, Hastie T, Tibshirani R (2014) An Introduction to Statistical Learning: With Applications in R

  • Jurafsky D, Martin JH (2000) Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition, 1st edn. Prentice Hall PTR, Upper Saddle River

    Google Scholar 

  • Kiyavitskaya N, Zeni N, Mich L, Cordy JR, Mylopoulos J (2006) Text mining through semi automatic semantic annotation. In: Proceedings of the 6th International Conference on Practical Aspects of Knowledge Management (PAKM’06), pp 143–154

  • Kiyavitskaya N, Zeni N, Breaux TD, Antón AI, Cordy JR, Mich L, Mylopoulos J (2008) Automating the extraction of rights and obligations for regulatory compliance. In: Proceedings of the 27th International Conference on Conceptual Modeling (ER’08), pp 154–168

  • Kummerfeld JK, Hall DLW, Curran JR, Klein D (2012) Parser showdown at the wall street corral: An empirical investigation of error types in parser output. In: Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL’12), pp 1048–1059

  • Lam H, Hashmi M, Scofield B (2016) Enabling reasoning with LegalRuleML. In: Proceedings of the 10th international symposium on rule technologies. Research, Tools, and Applications (RuleML’16), pp 241–257

  • Landis J, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174

    Article  Google Scholar 

  • Levy R, Andrew G (2006) Tregex and tsurgeon: tools for querying and manipulating tree data structures. In: Proceedings of the 5th international conference on language resources and evaluation (LREC’06), pp 2231–2234

  • Lucassen G, Robeer M, Dalpiaz F, van der Werf JMEM, Brinkkemper S (2017) Extracting conceptual models from user stories with visual narrator. Requir Eng 22(3):339–358

    Article  Google Scholar 

  • Massey A (2012) Legal requirements metrics for compliance analysis. PhD thesis, North Carolina State University Raleigh, North Carolina, USA

  • Massey AK, Otto PN, Hayward LJ, Antȯn A I (2010) Evaluating existing security and privacy requirements for legal compliance. Requir Eng 15 (1):119–137

    Article  Google Scholar 

  • Maxwell JC, Antón AI (2010) The production rule framework: developing a canonical set of software requirements for compliance with law. In: Proceedings of the ACM international health informatics symposium (IHI’10), pp 629–636

  • Maxwell JC, Antȯn AI, Swire PP, Riaz M, McCraw CM (2012) A legal cross-references taxonomy for reasoning about compliance requirements. Requir Eng 17(2):99–115

    Article  Google Scholar 

  • McDonald RT, Nivre J (2007) Characterizing the errors of data-driven dependency parsing models. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL’07), pp 122–131

  • Nivre J, Hall J, Nilsson J, Chanev A, Eryigit G, Ku̇bler S, Marinov S, Marsi E (2007) Maltparser: a language-independent system for data-driven dependency parsing. Nat Lang Eng 13(2):95–135

    Article  Google Scholar 

  • Peters W, Sagri M, Tiscornia D (2007) The structuring of legal knowledge in LOIS. Artificial Intelligence and Law 15(2):117–135

    Article  Google Scholar 

  • Petrov S, Barrett L, Thibaux R, Klein D (2006) Learning accurate, compact, and interpretable tree annotation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (ACL’06)

  • Pradet Q, Danlos L, de Chalendar G (2014) Adapting verbnet to french using existing resources. In: The ninth international conference on language resources and evaluation (LREC’14), pp 1122–1126

  • Princeton University (2010) About WordNet. http://wordnet.princeton.edu

  • Quirchmayr T, Paech B, Kohl R, Karey H, Kasdepke G (2018) Semi-automatic rule-based domain terminology and software feature-relevant information extraction from natural language user manuals. Empirical Software Engineering 23(6):3630–3683

    Article  Google Scholar 

  • Rosadini B, Ferrari A, Gori G, Fantechi A, Gnesi S, Trotta I, Bacherini S (2017) Using NLP to detect requirements defects: An industrial experience in the railway domain. In: Proceedings of the 23rd International Working Conference on Requirements Engineering: Foundation for Software Quality (REFSQ’17), pp 344–360

  • RuleML (2015) Specification of RuleML 1.02. http://wiki.ruleml.org/index.php/Specification_of_RuleML_1.02/

  • Sagot B (2010) The Lefff, a freely available and large-coverage morphological and syntactic lexicon for french. In: Proceedings of the international conference on language resources and evaluation(LREC’10), pp 2745–2751

  • Saldaña J (2015) The Coding Manual for Qualitative Researchers. Sage

  • Sannier N, Adedjouma M, Sabetzadeh M, Briand LC (2016) Automated classification of legal cross references based on semantic intent. In: Proceedings of the 22nd international working conference on requirements engineering: foundation for software quality (REFSQ’16), pp 119–134

  • Sannier N, Adedjouma M, Sabetzadeh M, Briand LC (2017) An automated framework for detection and resolution of cross references in legal texts. Requir Eng 22(2):215–237

    Article  Google Scholar 

  • Sartor G, Casanovas P, Biasiotti M, Fernndez-Barrera M (2013) Approaches to legal ontologies: Theories, Domains, Methodologies. Springer, Berlin

    Google Scholar 

  • Siena A, Mylopoulos J, Perini A, Susi A (2009) Designing law-compliant software requirements. In: Proceedings of the 28th international conference on conceptual modeling (ER’09), pp 472–486

  • Siena A, Jureta I, Ingolfo S, Susi A, Perini A, Mylopoulos J (2012) Capturing variability of law with nómos 2. In: Proceedings of the 31st international conference on conceptual modeling (ER’12), pp 383–396

  • Sleimi A, Sannier N, Sabetzadeh M, Briand LC, Dann J (2018) Automated extraction of semantic legal metadata using natural language processing. In: Proceedings of the 26th IEEE international requirements engineering conference (RE’18), pp 302–311

  • Sleimi A, Ceci M, Sannier N, Sabetzadeh M, Briand LC, Dann J (2019) A query system for extracting requirements-related information from legal texts. In: Proceedings of the 27th IEEE international requirements engineering conference (RE’19)

  • Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In: The 19th ACM SIGKDD international conference on knowledge discovery and data mining KDD, pp 847–855

  • Wiki (2004) Wiktionnaire. https://fr.wiktionary.org/

  • Zeni N, Kiyavitskaya N, Mich L, Cordy JR, Mylopoulos J (2015) Gaiust: supporting the extraction of rights and obligations for regulatory compliance. Requir Eng 20(1):1–22

    Article  Google Scholar 

  • Zeni N, Seid EA, Engiel P, Ingolfo S, Mylopoulos J (2016) Building large models of law with NómosT. In: Proceedings of the 35th international conference on conceptual modeling (ER’16), pp 233–247

Download references

Acknowledgments

Supported by the Luxembourg National Research Fund (FNR) under grants PUBLIC2-17/IS/11801776 and PoC16/11554296, and by NSERC of Canada under the Discovery, Discovery Accelerator and CRC programs.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amin Sleimi.

Additional information

Communicated by: Federica Sarro

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: List of modal verbs expressing obligation, permission and prohibition

  • sont soumis

  • seront soumis

  • sera soumis

  • est soumis

  • est soumise

  • sont soumises

  • il doit

  • elle doit

  • ils doivent

  • elles doivent

  • il devra

  • elle devra

  • ils devront

  • elles devront

  • il oblige

  • elle oblige

  • ils obligent

  • elles obligent

  • il obligera

  • elle obligera

  • ils obligeront

  • elles obligeront

  • obligent

  • soumis

  • soumise

  • doit

  • devra

  • doivent

  • devront

  • obligation

  • obligé

  • devoir

  • est tenue

  • est tenu

  • il est obligé

  • elle est obligeé

  • est obligée

  • obligé

  • il est nécessaire

  • nécessaire

  • toujours

  • exigence

  • astreint

  • astreinte

  • astreint

  • vassuré

  • assurée

  • oblige

  • requis

  • requise

  • Permission

  • susceptible

  • droit

  • facultative

  • ont

  • elles peuvent

  • ils pourront

  • il peut

  • elle peut

  • pourront

  • permission

  • peut

  • vpeuvent

  • pouvoir

  • pourront

  • pourra

  • pouvant

  • permis

  • permise

  • est facultative

  • autorise

  • autorisé

  • autorisée

  • puisse

  • autorisation

  • possible

  • ont droit

  • a droit

  • aura droit

  • sont en droit

  • est en droit

  • sera en droit

  • seront en droit

  • sont susceptibles

  • est susceptible

  • sera susceptible

  • seront susceptibles

  • Prohibition

  • interdit

  • ne doit

  • n’ est pas en droit

  • est interdite

  • il est interdit

  • est interdit

  • ne peut

  • ne pourra

  • ne peuvent

  • ne sont pas autorisé

  • est prohibé

  • est prohibée

  • prohibé

  • interdiction

  • est illégal

  • est réprouvé

  • est réprouvée

  • proscrit

  • proscrite

  • est illicite

Appendix B: Results of ML experimentation for actor classification

The Auto-WEKA experiment for the actor dataset was run for different ML techniques and utilized hyper-parameter optimization. In other words, all hyper-parameters were automatically optimized for all ML techniques following the same automated procedure and therefore all classifiers were compared in a fair manner. The four top configurations are listed below:

  1. 1.

    RandomForest -bagSizePercent 100 -numIterations 73 -numExecutionSlots 1 -numFeatures 0 -minimumVarianceForSplit 0.001 -seed 73

  2. 2.

    NaiveBayes -batch-size 10

  3. 3.

    LibSVM -typeOfSVM C-SVC -kernelFunction radial basis function -degreeForKernelFunction 3 -GammaForKernelFunction 0.0 -coefficientInKernelFunction 0.0 -parameterNuOf nu-SVC 0.5 -CacheMemory 40.0 -ParameterCOfnu-SVC 1.0 -toleranceOfTerminationCriterion 0.001 -epsilonInLossFunction 0.1 -seed 73

  4. 4.

    Decision Tree J48 -confidenceFactor 0.15 -minimumInstance 2

We present the accuracy results of each of these configurations of ML techniques for the actor classification task in Table 11.

Table 11 Results for ML-based actor classification

The Weka documentation (Frank et al. 2016) defines the hyper-parameters for the above algorithms as follows:

  1. 1.

    seed – The random number seed to be used.

  2. 2.

    bagSizePercent – Size of each bag, as a percentage of the training set size.

  3. 3.

    numIterations – The number of iterations to be performed.

  4. 4.

    numExecutionSlots – The number of execution slots (threads) to use for constructing the ensemble.

  5. 5.

    numFeatures – Sets the number of randomly chosen attributes. If 0, int(log_2(numbor of predictors) + 1) is used.

  6. 6.

    minimumVarianceForSplit – the minimum numeric class variance proportion of train variance for split

  7. 7.

    batchSize – The preferred number of instances to process if batch prediction is being performed. More or fewer instances may be provided, but this gives implementations a chance to specify a preferred batch size.

  8. 8.

    confidenceFactor – the confidence factor used for pruning (smaller values incur more pruning)

  9. 9.

    minimumInstance – the minimum number of instances per leaf

  10. 10.

    typeOfSVM – the type of SVM to use

  11. 11.

    kernelFunction – the type of kernel to use

  12. 12.

    degreeForKernelFunction – The degree of the kernel

  13. 13.

    GammaForKernelFunction – The gamma to use for the kernel function

  14. 14.

    coefficientInKernelFunction – The coefficient to use for the kernel function

  15. 15.

    parameterNuOf nu-SVC – The value of nu coefficient for nu-SVC

  16. 16.

    CacheMemory – the cache size in MB

  17. 17.

    ParameterCOfnu-SVC – the parameter in the coefficient nu

  18. 18.

    toleranceOfTerminationCriterion – the tolerance of termination criterion

  19. 19.

    epsilonInLossFunction – The epsilon for the loss function in C-SVC

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sleimi, A., Sannier, N., Sabetzadeh, M. et al. An automated framework for the extraction of semantic legal metadata from legal texts. Empir Software Eng 26, 43 (2021). https://doi.org/10.1007/s10664-020-09933-5

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-020-09933-5

Keywords

Navigation