The Quarrel of Local Post-hoc Explainers for Moral Values Classification in Natural Language Processing

Agiollo, Andrea; Cavalcante Siebert, Luciano; Murukannaiah, Pradeep Kumar; Omicini, Andrea

doi:10.1007/978-3-031-40878-6_6

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14127))

Included in the following conference series:

International Workshop on Explainable, Transparent Autonomous Agents and Multi-Agent Systems

487 Accesses
1 Citations

Abstract

Although popular and effective, large language models (LLM) are characterised by a performance vs. transparency trade-off that hinders their applicability to sensitive scenarios. This is the main reason behind many approaches focusing on local post-hoc explanations recently proposed by the XAI community. However, to the best of our knowledge, a thorough comparison among available explainability techniques is currently missing, mainly for the lack of a general metric to measure their benefits. We compare state-of-the-art local post-hoc explanation mechanisms for models trained over moral value classification tasks based on a measure of correlation. By relying on a novel framework for comparing global impact scores, our experiments show how most local post-hoc explainers are loosely correlated, and highlight huge discrepancies in their results—their “quarrel” about explanations. Finally, we compare the impact scores distribution obtained from each local post-hoc explainer with human-made dictionaries, and point out that there is no correlation between explanation outputs and the concepts humans consider as salient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 6634; Price includes VAT (Japan)

Softcover Book: JPY 8293; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Do Language Models Understand Morality? Towards a Robust Detection of Moral Content

From large language models to small logic programs: building global explanations from disagreeing local post-hoc explainers

Article Open access 08 July 2024

The extended Moral Foundations Dictionary (eMFD): Development and applications of a crowd-sourced approach to extracting moral intuitions from text

Article 14 July 2020

Notes

References

Abnar, S., Zuidema, W.: Quantifying attention flow in transformers. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4190–4197. Association for Computational Linguistics, July 2020. https://doi.org/10.18653/v1/2020.acl-main.385
Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052
Article Google Scholar
Agiollo, A., Ciatto, G., Omicini, A.: Graph neural networks as the copula mundi between logic and machine learning: a roadmap. In: Calegari, R., Ciatto, G., Denti, E., Omicini, A., Sartor, G. (eds.) WOA 2021–22nd Workshop “From Objects to Agents”. CEUR Workshop Proceedings, vol. 2963, pp. 98–115. Sun SITE Central Europe, RWTH Aachen University, October 2021. http://ceur-ws.org/Vol-2963/paper18.pdf, 22nd Workshop “From Objects to Agents” (WOA 2021), Bologna, Italy, 1–3 September 2021. Proceedings
Agiollo, A., Ciatto, G., Omicini, A.: Shallow2Deep: restraining neural networks opacity through neural architecture search. In: Calvaresi, D., Najjar, A., Winikoff, M., Främling, K. (eds.) Explainable and Transparent AI and Multi-agent Systems. Third International Workshop, EXTRAAMAS 2021. LNCS, vol. 12688, pp. 63–82. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-82017-6_5
Agiollo, A., Omicini, A.: Load classification: a case study for applying neural networks in hyper-constrained embedded devices. Appl. Sci. 11(24) (2021). https://doi.org/10.3390/app112411957, https://www.mdpi.com/2076-3417/11/24/11957, Special Issue “Artificial Intelligence and Data Engineering in Engineering Applications”
Agiollo, A., Omicini, A.: GNN2GNN: graph neural networks to generate neural networks. In: Cussens, J., Zhang, K. (eds.) Uncertainty in Artificial Intelligence. Proceedings of Machine Learning Research, vol. 180, pp. 32–42. ML Research Press, August 2022. https://proceedings.mlr.press/v180/agiollo22a.html, Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, UAI 2022, 1–5 August 2022, Eindhoven, The Netherlands
Agiollo, A., Rafanelli, A., Omicini, A.: Towards quality-of-service metrics for symbolic knowledge injection. In: Ferrando, A., Mascardi, V. (eds.) WOA 2022–23rd Workshop “From Objects to Agents”, CEUR Workshop Proceedings, vol. 3261, pp. 30–47. Sun SITE Central Europe, RWTH Aachen University, November 2022. http://ceur-ws.org/Vol-3261/paper3.pdf
Ali, A., Schnake, T., Eberle, O., Montavon, G., Müller, K.R., Wolf, L.: XAI for transformers: better explanations through conservative propagation. In: International Conference on Machine Learning, pp. 435–451. PMLR (2022). https://proceedings.mlr.press/v162/ali22a.html
Alshomary, M., Baff, R.E., Gurcke, T., Wachsmuth, H.: The moral debater: a study on the computational generation of morally framed arguments. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 8782–8797. Association for Computational Linguistics, Dublin, Ireland, May 2022. https://doi.org/10.18653/v1/2022.acl-long.601
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS ONE 10(7), e0130140 (2015). https://doi.org/10.1371/journal.pone.0130140
Bender, E.M., Gebru, T., McMillan-Major, A., Shmitchell, S.: On the dangers of stochastic parrots: can language models be too big? In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 610–623 (2021). https://doi.org/10.1145/3442188.3445922
Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020). https://dl.acm.org/doi/abs/10.5555/3495724.3495883
Buhrmester, V., Münch, D., Arens, M.: Analysis of explainers of black box deep neural networks for computer vision: a survey. Mach. Learn. Knowl. Extr. 3(4), 966–989 (2021). https://doi.org/10.3390/make3040048
Article Google Scholar
Danilevsky, M., Qian, K., Aharonov, R., Katsis, Y., Kawas, B., Sen, P.: A survey of the state of explainable AI for natural language processing. In: Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pp. 447–459. Association for Computational Linguistics, Suzhou, China, December 2020. https://aclanthology.org/2020.aacl-main.46
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, MN, USA, June 2019. https://doi.org/10.18653/v1/N19-1423
Främling, K., Westberg, M., Jullum, M., Madhikermi, M., Malhi, A.: Comparison of contextual importance and utility with LIME and Shapley values. In: Calvaresi, D., Najjar, A., Winikoff, M., Främling, K. (eds.) Explainable and Transparent AI and Multi-agent Systems - Third International Workshop, EXTRAAMAS 2021. LNCS, vol. 12688, pp. 39–54. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-82017-6_3
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. (CSUR) 51(5), 1–42 (2018). https://doi.org/10.1145/3236009
Article Google Scholar
Hailesilassie, T.: Rule extraction algorithm for deep neural networks: a review. Int. J. Comput. Sci. Inf. Secur. 14(7), 376–381 (2016). https://www.academia.edu/28181177/Rule_Extraction_Algorithm_for_Deep_Neural_Networks_A_Review
Hao, T., Li, X., He, Y., Wang, F.L., Qu, Y.: Recent progress in leveraging deep learning methods for question answering. Neural Comput. Appl. 34(4), 2765–2783 (2022). https://doi.org/10.1007/s00521-021-06748-3
Article Google Scholar
Hoover, J., et al.: Moral foundations Twitter corpus: a collection of 35k tweets annotated for moral sentiment. Soc. Psychol. Pers. Sci. 11(8), 1057–1071 (2020). https://doi.org/10.1177/194855061987662
Article Google Scholar
Hopp, F.R., Fisher, J.T., Cornell, D., Huskey, R., Weber, R.: The extended moral foundations dictionary (eMFD): development and applications of a crowd-sourced approach to extracting moral intuitions from text. Behav. Res. Methods 53, 232–246 (2021). https://doi.org/10.3758/s13428-020-01433-0
Article Google Scholar
Ibrahim, M., Louie, M., Modarres, C., Paisley, J.: Global explanations of neural networks: mapping the landscape of predictions. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 279–287 (2019). https://doi.org/10.1145/3306618.3314230
Jaume, G., et al.: Quantifying explainers of graph neural networks in computational pathology. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, 19–25 June 2021, pp. 8106–8116. Computer Vision Foundation/IEEE (2021). https://doi.org/10.1109/CVPR46437.2021.00801
Kiesel, J., Alshomary, M., Handke, N., Cai, X., Wachsmuth, H., Stein, B.: Identifying the human values behind arguments. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 4459–4471 (2022). https://doi.org/10.18653/v1/2022.acl-long.306
Kindermans, P.J., et al.: The (un)reliability of saliency methods. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L., Müller, K.R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, pp. 267–280. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_14
Kokalj, E., Škrlj, B., Lavrač, N., Pollak, S., Robnik-Šikonja, M.: BERT meets Shapley: extending SHAP explanations to transformer-based classifiers. In: Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation, pp. 16–21 (2021)
Google Scholar
Liscio, E., et al.: What does a text classifier learn about morality? An explainable method for cross-domain comparison of moral rhetoric. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, pp. 1–12, Toronto (2023, to appear)
Google Scholar
Liscio, E., Dondera, A., Geadau, A., Jonker, C., Murukannaiah, P.: Cross-domain classification of moral values. In: Findings of the Association for Computational Linguistics: NAACL 2022, pp. 2727–2745. Association for Computational Linguistics, Seattle, United States, July 2022. https://doi.org/10.18653/v1/2022.findings-naacl.209
Liu, G., et al.: Medical-VLBERT: medical visual language BERT for COVID-19 CT report generation with alternate learning. IEEE Trans. Neural Netw. Learn. Syst. 32(9), 3786–3797 (2021). https://doi.org/10.1109/TNNLS.2021.3099165
Article Google Scholar
Loh, W.Y.: Fifty years of classification and regression trees. Int. Stat. Rev. 82(3), 329–348 (2014). https://doi.org/10.1111/insr.12016
Article MathSciNet MATH Google Scholar
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30 (2017). https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
Luo, S., Ivison, H., Han, C., Poon, J.: Local interpretations for explainable natural language processing: a survey. arXiv preprint arXiv:2103.11072 (2021)
Madsen, A., Reddy, S., Chandar, S.: Post-hoc interpretability for neural NLP: a survey. ACM Comput. Surv. 55(8), 1–42 (2022). https://doi.org/10.1145/3546577
Article Google Scholar
Otter, D.W., Medina, J.R., Kalita, J.K.: A survey of the usages of deep learning for natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 32(2), 604–624 (2021). https://doi.org/10.1109/TNNLS.2020.2979670
Article MathSciNet Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(1), 5485–5551 (2020). https://dl.acm.org/doi/abs/10.5555/3455716.3455856
Ramachandran, D., Parvathi, R.: Analysis of Twitter specific preprocessing technique for tweets. Procedia Comput. Sci. 165, 245–251 (2019). https://doi.org/10.1016/j.procs.2020.01.083
Article Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016). https://doi.org/10.18653/v1/N16-3020
Samek, W., Montavon, G., Lapuschkin, S., Anders, C.J., Müller, K.R.: Explaining deep neural networks and beyond: a review of methods and applications. Proc. IEEE 109(3), 247–278 (2021). https://doi.org/10.1109/JPROC.2021.3060483
Article Google Scholar
Sheikhalishahi, S., et al.: Natural language processing of clinical notes on chronic diseases: systematic review. JMIR Med. Inform. 7(2), e12239 (2019). https://doi.org/10.2196/12239
Article Google Scholar
Stahlberg, F.: Neural machine translation: a review. J. Artif. Intell. Res. 69, 343–418 (2020). https://doi.org/10.1613/jair.1.12007
Article MathSciNet Google Scholar
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: Precup, D., Teh, Y.W. (eds.) International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 3319–3328. PMLR, August 2017. https://proceedings.mlr.press/v70/sundararajan17a.html
Tay, Y., Bahri, D., Metzler, D., Juan, D.C., Zhao, Z., Zheng, C.: Synthesizer: rethinking self-attention for transformer models. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 10183–10192. PMLR, July 2021. https://proceedings.mlr.press/v139/tay21a.html
Warstadt, A., Singh, A., Bowman, S.R.: Neural network acceptability judgments. Trans. Assoc. Comput. Linguist. 7, 625–641 (2019). https://doi.org/10.1162/tacl_a_00290
Article Google Scholar
Wu, Z., Nguyen, T.S., Ong, D.C.: Structured self-attention weights encode semantics in sentiment analysis. In: Proceedings of the Third Blackbox NLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp. 255–264. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.blackboxnlp-1.24
Zhang, L., Wang, S., Liu, B.: Deep learning for sentiment analysis: a survey. Wiley Interdisc. Rev. Data Mining Knowl. Discov. 8(4) (2018). https://doi.org/10.1002/widm.1253
Zhang, Z., Sabuncu, M.: Generalized cross entropy loss for training deep neural networks with noisy labels. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018). https://proceedings.neurips.cc/paper_files/paper/2018/file/f2925f97bc13ad2852a7a551802feea0-Paper.pdf
Zini, J.E., Awad, M.: On the explainability of natural language processing deep models. ACM Comput. Surv. 55(5), 1–31 (2022). https://doi.org/10.1145/3529755
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica – Scienza e Ingegneria (DISI), Alma Mater Studiorum—Università di Bologna, Cesena, Italy
Andrea Agiollo & Andrea Omicini
Delft University of Technology, Delft, The Netherlands
Luciano Cavalcante Siebert & Pradeep Kumar Murukannaiah

Authors

Andrea Agiollo
View author publications
You can also search for this author in PubMed Google Scholar
Luciano Cavalcante Siebert
View author publications
You can also search for this author in PubMed Google Scholar
Pradeep Kumar Murukannaiah
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Omicini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Agiollo .

Editor information

Editors and Affiliations

University of Applied Sciences and Arts Western Switzerland, Sierre, Switzerland
Davide Calvaresi
Luxembourg Institute of Science and Technology, Esch-sur-Alzette, Luxembourg
Amro Najjar
Alma Mater Studiorum, Università di Bologna, Bologna, Italy
Andrea Omicini
Ozyegin University, Istanbul, Türkiye
Reyhan Aydogan
Alma Mater Studiorum, Università di Bologna, Bologna, Italy
Rachele Carli
Alma Mater Studiorum, Università di Bologna, Bologna, Italy
Giovanni Ciatto
Université de Technologie de Belfort-Montbéliard, Belfort Cedex, France
Yazan Mualla
Umeå University, Umeå, Sweden
Kary Främling

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Agiollo, A., Cavalcante Siebert, L., Murukannaiah, P.K., Omicini, A. (2023). The Quarrel of Local Post-hoc Explainers for Moral Values Classification in Natural Language Processing. In: Calvaresi, D., et al. Explainable and Transparent AI and Multi-Agent Systems. EXTRAAMAS 2023. Lecture Notes in Computer Science(), vol 14127. Springer, Cham. https://doi.org/10.1007/978-3-031-40878-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-40878-6_6
Published: 05 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40877-9
Online ISBN: 978-3-031-40878-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Quarrel of Local Post-hoc Explainers for Moral Values Classification in Natural Language Processing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Do Language Models Understand Morality? Towards a Robust Detection of Moral Content

From large language models to small logic programs: building global explanations from disagreeing local post-hoc explainers

The extended Moral Foundations Dictionary (eMFD): Development and applications of a crowd-sourced approach to extracting moral intuitions from text

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

The Quarrel of Local Post-hoc Explainers for Moral Values Classification in Natural Language Processing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Do Language Models Understand Morality? Towards a Robust Detection of Moral Content

From large language models to small logic programs: building global explanations from disagreeing local post-hoc explainers

The extended Moral Foundations Dictionary (eMFD): Development and applications of a crowd-sourced approach to extracting moral intuitions from text

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation