{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,6,5]],"date-time":"2024-06-05T00:27:40Z","timestamp":1717547260473},"reference-count":99,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,5,25]],"date-time":"2024-05-25T00:00:00Z","timestamp":1716595200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,5,25]],"date-time":"2024-05-25T00:00:00Z","timestamp":1716595200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001663","name":"Volkswagen Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001663","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002347","name":"Bundesministerium f\u00fcr Bildung und Forschung","doi-asserted-by":"publisher","award":["DKI.00.0002 3.20"],"id":[{"id":"10.13039\/501100002347","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Rheinland-Pf\u00e4lzische Technische Universit\u00e4t Kaiserslautern-Landau"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Minds & Machines"],"abstract":"Abstract<\/jats:title>For years, the number of opaque algorithmic decision-making systems (ADM systems) with a large impact on society has been increasing: e.g., systems that compute decisions about future recidivism of criminals, credit worthiness, or the many small decision computing systems within social networks that create rankings, provide recommendations, or filter content. Concerns that such a system makes biased decisions can be difficult to investigate: be it by people affected, NGOs, stakeholders, governmental testing and auditing authorities, or other external parties. Scientific testing and auditing literature rarely focuses on the specific needs for such investigations and suffers from ambiguous terminologies. With this paper, we aim to support this investigation process by collecting, explaining, and categorizing methods of testing for bias, which are applicable to black-box systems, given that inputs and respective outputs can be observed. For this purpose, we provide a taxonomy that can be used to select suitable test methods adapted to the respective situation. This taxonomy takes multiple aspects into account, for example the effort to implement a given test method, its technical requirement (such as the need of ground truth) and social constraints of the investigation, e.g., the protection of business secrets. Furthermore, we analyze which test method can be used in the context of which black box audit concept. It turns out that various factors, such as the type of black box audit or the lack of an oracle, may limit the selection of applicable tests. With the help of this paper, people or organizations who want to test an ADM system for bias can identify which test methods and auditing concepts are applicable and what implications they entail.<\/jats:p>","DOI":"10.1007\/s11023-024-09666-0","type":"journal-article","created":{"date-parts":[[2024,5,25]],"date-time":"2024-05-25T02:01:42Z","timestamp":1716602502000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Black-Box Testing and Auditing of Bias in ADM Systems"],"prefix":"10.1007","volume":"34","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-3527-1092","authenticated-orcid":false,"given":"Tobias D.","family":"Krafft","sequence":"first","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0002-1598-1812","authenticated-orcid":false,"given":"Marc P.","family":"Hauer","sequence":"additional","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0002-4294-9017","authenticated-orcid":false,"given":"Katharina","family":"Zweig","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,5,25]]},"reference":[{"key":"9666_CR1","unstructured":"Altenbockum, J. V. (2011). NRW verliert seinen letzten Frauenbuchladen. boersenblatt.net."},{"key":"9666_CR2","unstructured":"Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine bias\u2014there\u2019s software used across the country to predict future criminals. and it\u2019s biased against blacks. ProPublica."},{"key":"9666_CR3","volume-title":"Fairness and machine learning: Limitations and opportunities","author":"S Barocas","year":"2019","unstructured":"Barocas, S., Hardt, M., & Narayanan, A. (2019). Fairness and machine learning: Limitations and opportunities. MIT."},{"issue":"5","key":"9666_CR4","doi-asserted-by":"publisher","first-page":"507","DOI":"10.1109\/TSE.2014.2372785","volume":"41","author":"ET Barr","year":"2014","unstructured":"Barr, E. T., Harman, M., McMinn, P., Shahbaz, M., & Yoo, S. (2014). The oracle problem in software testing: A survey. IEEE Transactions on Software Engineering, 41(5), 507\u2013525.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"9666_CR5","doi-asserted-by":"publisher","unstructured":"Binns, R. (2020). On the apparent conflict between individual and group fairness. In: Proceedings of the 2020 conference on fairness, accountability, and transparency. FAT* \u201920 (pp. 514\u2013524). Association for Computing Machinery. https:\/\/doi.org\/10.1145\/3351095.3372864","DOI":"10.1145\/3351095.3372864"},{"key":"9666_CR6","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1007\/978-3-031-11089-4_7","volume-title":"Advanced digital auditing progress in IS","author":"A Boer","year":"2023","unstructured":"Boer, A., de Beer, L., & van Praat, F. (2023). Algorithm assurance: Auditing applications of artificial intelligence. In E. Berghout, R. Fijneman, L. Hendriks, M. de Boer, & B.-J. Butijn (Eds.), Advanced digital auditing progress in IS (pp. 149\u2013183). Springer. https:\/\/doi.org\/10.1007\/978-3-031-11089-4_7"},{"key":"9666_CR7","doi-asserted-by":"crossref","unstructured":"Breck, E., Cai, S., Nielsen, E., Salib, M., & Sculley, D. (2017). The ml test score: A rubric for ML production readiness and technical debt reduction. In 2017 IEEE international conference on big data (Big Data) (pp. 1123\u20131132). IEEE.","DOI":"10.1109\/BigData.2017.8258038"},{"key":"9666_CR8","unstructured":"Brendel, W., Rauber, J., & Bethge, M. (2017). Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv preprint. arXiv:1712.04248"},{"issue":"1","key":"9666_CR9","doi-asserted-by":"publisher","first-page":"205395172098386","DOI":"10.1177\/2053951720983865","volume":"8","author":"S Brown","year":"2021","unstructured":"Brown, S., Davidovic, J., & Hasan, A. (2021). The algorithm audit: Scoring the algorithms that score us. Big Data & Society, 8(1), 2053951720983865. https:\/\/doi.org\/10.1177\/2053951720983865","journal-title":"Big Data & Society"},{"key":"9666_CR10","volume-title":"Rule-based expert systems: The MYCIN experiments of the stanford heuristic programming project","author":"BG Buchanan","year":"1984","unstructured":"Buchanan, B. G., & Shortliffe, E. H. (1984). Rule-based expert systems: The MYCIN experiments of the stanford heuristic programming project. Addison Wesley Longman."},{"key":"9666_CR12","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1613\/jair.1.12228","volume":"70","author":"N Burkart","year":"2021","unstructured":"Burkart, N., & Huber, M. F. (2021). A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research, 70, 245\u2013317.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"9666_CR13","unstructured":"Chen, T., Cheung, S., & Yiu, S. (1998). Metamorphic testing: a new approach for generating next test cases. Technical Report hkust-cs98-01. Hong Kong University of Science and Technology."},{"issue":"1","key":"9666_CR14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/S0950-5849(02)00129-5","volume":"45","author":"TY Chen","year":"2003","unstructured":"Chen, T. Y., Tse, T. H., & Zhou, Z. Q. (2003). Fault-based testing without the need of oracles. Information and Software Technology, 45(1), 1\u20139.","journal-title":"Information and Software Technology"},{"issue":"5","key":"9666_CR15","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1109\/52.536462","volume":"13","author":"DM Cohen","year":"1996","unstructured":"Cohen, D. M., Dalal, S. R., Parelius, J., & Patton, G. C. (1996). The combinatorial design approach to automatic test generation. IEEE software, 13(5), 83\u201388.","journal-title":"IEEE software"},{"key":"9666_CR16","doi-asserted-by":"publisher","first-page":"2251","DOI":"10.1109\/ACCESS.2017.2782678","volume":"6","author":"J Cruz-Benito","year":"2017","unstructured":"Cruz-Benito, J., V\u00e1zquez-Ingelmo, A., S\u00e1nchez-Prieto, J. C., Ther\u00f3n, R., Garc\u00eda-Pe\u00f1alvo, F. J., & Mart\u00edn-Gonz\u00e1lez, M. (2017). Enabling adaptability in web forms based on user characteristics detection through a\/b testing and machine learning. IEEE Access, 6, 2251\u20132265.","journal-title":"IEEE Access"},{"key":"9666_CR17","doi-asserted-by":"crossref","unstructured":"Danks, D., & London, A. J. (2017). Algorithmic bias in autonomous systems. In IJCAI (Vol. 17, pp. 4691\u20134697).","DOI":"10.24963\/ijcai.2017\/654"},{"issue":"1","key":"9666_CR18","doi-asserted-by":"publisher","first-page":"92","DOI":"10.1515\/popets-2015-0007","volume":"2015","author":"A Datta","year":"2015","unstructured":"Datta, A., Tschantz, M. C., & Datta, A. (2015). Automated experiments on ad privacy settings. Proceedings on Privacy Enhancing Technologies, 2015(1), 92\u2013112. https:\/\/doi.org\/10.1515\/popets-2015-0007","journal-title":"Proceedings on Privacy Enhancing Technologies"},{"key":"9666_CR19","doi-asserted-by":"crossref","unstructured":"Davis, M. D., & Weyuker, E. J. (1981). Pseudo-oracles for non-testable programs. In Proceedings of the ACM\u201981 conference (pp. 254\u2013257).","DOI":"10.1145\/800175.809889"},{"issue":"8","key":"9666_CR20","doi-asserted-by":"publisher","first-page":"635","DOI":"10.1136\/jech.2003.008466","volume":"58","author":"M Delgado-Rodriguez","year":"2004","unstructured":"Delgado-Rodriguez, M., & Llorca, J. (2004). Bias. Journal of Epidemiology, & Community Health, 58(8), 635\u2013641.","journal-title":"Journal of Epidemiology, & Community Health"},{"key":"9666_CR11","unstructured":"Deutscher Bundestag. (2020). Mehrheit der Fraktionen gegen den Begriff \u201cRasse\u201d im Grundgesetz. Deutscher Bundestag (2020 November 275), Abgerufen am: 27.02.2021. https:\/\/www.bundestag.de\/dokumente\/textarchiv\/2020\/kw48-de-rassismus-807790"},{"key":"9666_CR21","unstructured":"Di\u00a0Stefano, P. G., Hickey, J. M., & Vasileiou, V. (2020). Counterfactual fairness: Removing direct effects through regularization. arXiv preprint. arXiv:2002.10774"},{"issue":"3","key":"9666_CR22","doi-asserted-by":"publisher","first-page":"398","DOI":"10.1080\/21670811.2014.976411","volume":"3","author":"N Diakopoulos","year":"2015","unstructured":"Diakopoulos, N. (2015). Algorithmic accountability: Journalistic investigation of computational power structures. Digital journalism, 3(3), 398\u2013415.","journal-title":"Digital journalism"},{"key":"9666_CR23","doi-asserted-by":"crossref","unstructured":"Evans, R. B., & Savoia, A. (2007). Differential testing: A new approach to change detection. In The 6th joint meeting on European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering: Companion papers (pp. 549\u2013552).","DOI":"10.1145\/1295014.1295038"},{"key":"9666_CR24","doi-asserted-by":"crossref","unstructured":"Felderer, M., Russo, B., & Auer, F. (2019). On testing data-intensive software systems. In Security and quality in cyber-physical systems engineering (pp. 129\u2013148). Springer.","DOI":"10.1007\/978-3-030-25312-7_6"},{"key":"9666_CR25","unstructured":"Fry, H. (2018). Hello world: How to be human in the age of the machine. Random House."},{"key":"9666_CR26","doi-asserted-by":"publisher","unstructured":"Gaddis, S. M. (2018). An introduction to audit studies in the social sciences. In S. M. Gaddis (Ed.), Audit studies: Behind the scenes with theory, method, and nuance (pp. 3\u201344). Springer. https:\/\/doi.org\/10.1007\/978-3-319-71153-9_1","DOI":"10.1007\/978-3-319-71153-9_1"},{"key":"9666_CR27","doi-asserted-by":"crossref","unstructured":"Gilotte, A., Calauz\u00e8nes, C., Nedelec, T., Abraham, A., & Doll\u00e9, S. (2018). Offline a\/b testing for recommender systems. In Proceedings of the 11th ACM international conference on web search and data mining (pp. 198\u2013206).","DOI":"10.1145\/3159652.3159687"},{"key":"9666_CR28","unstructured":"Goodfellow, I.J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint. arXiv:1412.6572"},{"key":"9666_CR29","doi-asserted-by":"crossref","unstructured":"Gotlieb, A., & Marijan, D. (2014). Flower: optimal test suite reduction as a network maximum flow. In Proceedings of the 2014 international symposium on software testing and analysis (pp. 171\u2013180).","DOI":"10.1145\/2610384.2610416"},{"key":"9666_CR30","doi-asserted-by":"crossref","unstructured":"Groce, A., Holzmann, G., & Joshi, R. (2007). Randomized differential testing as a prelude to formal verification. In 29th International conference on software engineering (ICSE\u201907) (pp. 621\u2013631). IEEE.","DOI":"10.1109\/ICSE.2007.68"},{"key":"9666_CR31","doi-asserted-by":"publisher","unstructured":"Haeri, M. A., & Zweig, K. A. (2020). The crucial role of sensitive attributes in fair classification. In 2020 IEEE symposium series on computational intelligence (SSCI) (pp. 2993\u20133002). https:\/\/doi.org\/10.1109\/SSCI47803.2020.9308585","DOI":"10.1109\/SSCI47803.2020.9308585"},{"key":"9666_CR33","unstructured":"Hallensleben, S., Hustedt, C., Fetic, L., Fleischer, T., Gr\u00fcnke, P., Hagendorff, T., Hauer, M.P., Hauschke, A., Heesen, J., Herrmann, M., Hillerbrand, R., Hubig, C., Kaminski, A., Krafft, T.D., Loh, W., Otto, P., & Puntschuh, M. (2020). From principles to practice - an interdisciplinary framework to operationalise ai ethics. iRights. Lab, Tech. Rep."},{"key":"9666_CR34","doi-asserted-by":"publisher","unstructured":"Hann\u00e1k, A., Sapiezynski, P., Molavi\u00a0Kakhki, A., Krishnamurthy, B., Lazer, D., Mislove, A. & Wilson, C. (2013). Measuring personalization of web search. In Proceedings of the 22nd international conference on world wide web. WWW \u201913 (pp. 527\u2013538). Association for Computing Machinery. https:\/\/doi.org\/10.1145\/2488388.2488435","DOI":"10.1145\/2488388.2488435"},{"issue":"2","key":"9666_CR35","doi-asserted-by":"publisher","first-page":"735","DOI":"10.1007\/s11192-020-03407-7","volume":"123","author":"MP Hauer","year":"2020","unstructured":"Hauer, M. P., Hofmann, X. C. R., Krafft, T. D., Zweig, K. A., et al. (2020). Quantitative analysis of automatic performance evaluation systems based on the h-index. Scientometrics, 123(2), 735\u2013751.","journal-title":"Scientometrics"},{"key":"9666_CR36","doi-asserted-by":"publisher","DOI":"10.1016\/j.clsr.2021.105583","volume":"42","author":"MP Hauer","year":"2021","unstructured":"Hauer, M. P., Kevekordes, J., & Haeri, M. A. (2021). Legal perspective on possible fairness measures\u2014a legal discussion using the example of hiring decisions. Computer Law & Security Review, 42, 105583. https:\/\/doi.org\/10.1016\/j.clsr.2021.105583","journal-title":"Computer Law & Security Review"},{"key":"9666_CR37","volume-title":"Orthogonal arrays: Theory and applications","author":"AS Hedayat","year":"2012","unstructured":"Hedayat, A. S., Sloane, N. J. A., & Stufken, J. (2012). Orthogonal arrays: Theory and applications. Springer."},{"key":"9666_CR38","doi-asserted-by":"publisher","DOI":"10.1016\/j.clsr.2022.105658","volume":"44","author":"H Hoffmann","year":"2022","unstructured":"Hoffmann, H., Vogt, V., Hauer, M. P., & Zweig, K. (2022). Fairness by awareness? On the inclusion of protected features in algorithmic decisions. Computer Law & Security Review, 44, 105658.","journal-title":"Computer Law & Security Review"},{"issue":"4","key":"9666_CR39","doi-asserted-by":"publisher","first-page":"293","DOI":"10.1109\/TSE.1978.231514","volume":"SE\u20134","author":"WE Howden","year":"1978","unstructured":"Howden, W. E. (1978). Theoretical and empirical studies of program testing. IEEE Transactions on Software Engineering, SE\u20134(4), 293\u2013298.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"9666_CR40","unstructured":"Hynes, N., Sculley, D., & Terry, M. (2017). The data linter: Lightweight, automated sanity checking for ml data sets. In NIPS MLSys workshop."},{"key":"9666_CR41","doi-asserted-by":"publisher","unstructured":"IEEE. (1990). IEEE standard glossary of software engineering terminology. Std 610.12-1990 (pp. 1\u201384). https:\/\/doi.org\/10.1109\/IEEESTD.1990.101064","DOI":"10.1109\/IEEESTD.1990.101064"},{"key":"9666_CR42","unstructured":"ISO 19011. (2018). Guidelines for auditing management systems (Standard ed.). Beuth Verlag."},{"key":"9666_CR43","doi-asserted-by":"crossref","unstructured":"Itkonen, J., & Rautiainen, K. (2005). Exploratory testing: a multiple case study. In 2005 International symposium on empirical software engineering, 2005 (p. 10). IEEE.","DOI":"10.1109\/ISESE.2005.1541817"},{"key":"9666_CR44","doi-asserted-by":"crossref","unstructured":"Kanewala, U., & Bieman, J. M. (2013). Using machine learning techniques to detect metamorphic relations for programs without test oracles. In 2013 IEEE 24th International symposium on software reliability engineering (ISSRE) (pp. 1\u201310). IEEE.","DOI":"10.1109\/ISSRE.2013.6698899"},{"key":"9666_CR45","doi-asserted-by":"crossref","unstructured":"Kim, J., Feldt, R., & Yoo, S. (2019). Guiding deep learning system testing using surprise adequacy. In 2019 IEEE\/ACM 41st international conference on software engineering (ICSE) (pp. 1039\u20131049). IEEE.","DOI":"10.1109\/ICSE.2019.00108"},{"key":"9666_CR46","doi-asserted-by":"crossref","unstructured":"Klees, G., Ruef, A., Cooper, B., Wei, S., & Hicks, M. (2018). Evaluating fuzz testing. In Proceedings of the 2018 ACM SIGSAC conference on computer and communications security (pp. 2123\u20132138).","DOI":"10.1145\/3243734.3243804"},{"issue":"1","key":"9666_CR47","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1109\/TSE.1986.6312924","volume":"SE\u201312","author":"JC Knight","year":"1986","unstructured":"Knight, J. C., & Leveson, N. G. (1986). An experimental evaluation of the assumption of independence in multiversion programming. IEEE Transactions on Software Engineering, SE\u201312(1), 96\u2013109.","journal-title":"IEEE Transactions on Software Engineering"},{"issue":"8","key":"9666_CR48","doi-asserted-by":"publisher","first-page":"922","DOI":"10.1007\/978-1-4899-7687-1_891","volume":"7","author":"R Kohavi","year":"2017","unstructured":"Kohavi, R., & Longbotham, R. (2017). Online controlled experiments and a\/b testing. Encyclopedia of Machine Learning and Data Mining 7(8), 922\u2013929.","journal-title":"Encyclopedia of Machine Learning and Data Mining"},{"key":"9666_CR49","doi-asserted-by":"crossref","unstructured":"K\u00f6nig, P. D. (2019). Dissecting the algorithmic leviathan: On the socio-political anatomy of algorithmic governance. Philosophy & Technology, 33(3), 467\u2013485.","DOI":"10.1007\/s13347-019-00363-w"},{"key":"9666_CR50","doi-asserted-by":"crossref","unstructured":"Krafft, T. D., Hauer, M. P., & Zweig, K. A. (2020). Why do we need to be bots? What prevents society from detecting biases in recommendation systems. In International workshop on algorithmic bias in search and recommendation (pp. 27\u201334). Springer.","DOI":"10.1007\/978-3-030-52485-2_3"},{"issue":"1","key":"9666_CR51","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1140\/epjds\/s13688-019-0217-5","volume":"8","author":"TD Krafft","year":"2019","unstructured":"Krafft, T. D., Gamer, M., & Zweig, K. A. (2019). What did you see? A study to measure personalization in Google\u2019s search engine. EPJ Data Science, 8(1), 38. https:\/\/doi.org\/10.1140\/epjds\/s13688-019-0217-5","journal-title":"EPJ Data Science"},{"key":"9666_CR52","unstructured":"Kraus, T., Ganschow, L., Eisentr\u00e4ger, M., & Wischmann, S. (2021). Erkl\u00e4rbare KI - Anforderungen, Anwendungsf\u00e4lle und L\u00f6sungen. In K\u00fcnstliche Intelligenz Als Treiber F\u00fcr Volkswirtschaftlich Relevante \u00d6losysteme. Technologieprogramm KI-Innovationswettbewerb des BMWi."},{"key":"9666_CR53","doi-asserted-by":"crossref","unstructured":"Krishnan, S., Franklin, M. J., Goldberg, K., Wang, J., & Wu, E. (2016). Activeclean: An interactive data cleaning framework for modern machine learning. In Proceedings of the 2016 international conference on management of data (pp. 2117\u20132120).","DOI":"10.1145\/2882903.2899409"},{"key":"9666_CR54","volume-title":"Introduction to combinatorial testing","author":"DR Kuhn","year":"2013","unstructured":"Kuhn, D. R., Kacker, R. N., & Lei, Y. (2013). Introduction to combinatorial testing. CRC Press."},{"key":"9666_CR55","unstructured":"Kusner, M. J., Loftus, J.R., Russell, C., & Silva, R. (2017). Counterfactual fairness. arXiv preprint. arXiv:1703.06856"},{"key":"9666_CR56","doi-asserted-by":"publisher","unstructured":"Lucaj, L., van\u00a0der Smagt, P., & Benbouzid, D. (2023). Ai regulation is (not) all you need. In Proceedings of the 2023 ACM conference on fairness, accountability, and transparency. FAccT \u201923 (pp. 1267\u20131279). Association for Computing Machinery. https:\/\/doi.org\/10.1145\/3593013.3594079","DOI":"10.1145\/3593013.3594079"},{"key":"9666_CR57","doi-asserted-by":"crossref","unstructured":"Ma, P., Wang, S., & Liu, J. (2020). Metamorphic testing and certified mitigation of fairness violations in nlp models. In IJCAI (pp. 458\u2013465).","DOI":"10.24963\/ijcai.2020\/64"},{"key":"9666_CR58","unstructured":"Makhlouf, K., Zhioua, S., & Palamidessi, C. (2020). On the applicability of ML fairness notions. arXiv preprint. arXiv:2006.16745"},{"key":"9666_CR59","doi-asserted-by":"crossref","unstructured":"Marijan, D., Gotlieb, A., & Ahuja, M. K. (2019). Challenges of testing machine learning based systems. In 2019 IEEE international conference on artificial intelligence testing (AITest) (pp. 101\u2013102). IEEE.","DOI":"10.1109\/AITest.2019.00010"},{"issue":"1","key":"9666_CR60","first-page":"100","volume":"10","author":"WM McKeeman","year":"1998","unstructured":"McKeeman, W. M. (1998). Differential testing for software. Digital Technical Journal, 10(1), 100\u2013107.","journal-title":"Digital Technical Journal"},{"key":"9666_CR61","doi-asserted-by":"publisher","unstructured":"Metaxa, D., Park, J. S., Robertson, R. E., Karahalios, K., Wilson, C., Hancock, J., & Sandvig, C. (2021). Auditing algorithms: Understanding algorithmic systems from the outside in. Foundations and Trends in Human\u2013Computer Interaction, 14(4), 272\u2013344.https:\/\/doi.org\/10.1561\/1100000083","DOI":"10.1561\/1100000083"},{"key":"9666_CR62","doi-asserted-by":"publisher","unstructured":"Mikians, J., Gyarmati, L., Erramilli, V., & Laoutaris, N. (2012). Detecting price and search discrimination on the Internet. In Proceedings of the 11th ACM workshop on hot topics in networks (pp. 79\u201384). Association for Computing Machinery. https:\/\/doi.org\/10.1145\/2390231.2390245","DOI":"10.1145\/2390231.2390245"},{"issue":"3","key":"9666_CR63","doi-asserted-by":"publisher","first-page":"411","DOI":"10.2307\/2980740","volume":"115","author":"CA Moser","year":"1952","unstructured":"Moser, C. A. (1952). Quota sampling. Journal of the Royal Statistical Society. Series A (General), 115(3), 411\u2013423.","journal-title":"Journal of the Royal Statistical Society. Series A (General)"},{"issue":"2","key":"9666_CR64","doi-asserted-by":"publisher","first-page":"29","DOI":"10.5121\/ijesa.2012.2204","volume":"2","author":"S Nidhra","year":"2012","unstructured":"Nidhra, S., & Dondeti, J. (2012). Black box and white box testing techniques\u2014a literature review. International Journal of Embedded Systems and Applications (IJESA), 2(2), 29\u201350.","journal-title":"International Journal of Embedded Systems and Applications (IJESA)"},{"issue":"2","key":"9666_CR65","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1883612.1883618","volume":"43","author":"C Nie","year":"2011","unstructured":"Nie, C., & Leung, H. (2011). A survey of combinatorial testing. ACM Computing Surveys (CSUR), 43(2), 1\u201329.","journal-title":"ACM Computing Surveys (CSUR)"},{"key":"9666_CR66","doi-asserted-by":"crossref","unstructured":"Noble, S. (2013). Google search: Hyper-visibility as a means of rendering black women and girls invisible. InVisible Culture.","DOI":"10.47761\/494a02f6.50883fff"},{"issue":"3","key":"9666_CR67","first-page":"1356","volume":"10","author":"E Ntoutsi","year":"2020","unstructured":"Ntoutsi, E., Fafalios, P., Gadiraju, U., Iosifidis, V., Nejdl, W., Vidal, M.-E., Ruggieri, S., Turini, F., Papadopoulos, S., Krasanakis, E., et al. (2020). Bias in data-driven artificial intelligence systems\u2014an introductory survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(3), 1356.","journal-title":"Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery"},{"key":"9666_CR68","unstructured":"O\u2019Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Crown."},{"key":"9666_CR69","unstructured":"Orwat, C. (2019). Diskriminierungsrisiken Durch Verwendung Von Algorithmen. Antidiskriminierungsstelle des Bundes."},{"key":"9666_CR70","doi-asserted-by":"crossref","unstructured":"Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., & Swami, A. (2017). Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security (pp. 506\u2013519).","DOI":"10.1145\/3052973.3053009"},{"key":"9666_CR72","doi-asserted-by":"publisher","DOI":"10.4159\/harvard.9780674736061","volume-title":"The black box society","author":"F Pasquale","year":"2015","unstructured":"Pasquale, F. (2015). The black box society. Harvard University Press."},{"key":"9666_CR73","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1214\/09-SS057","volume":"3","author":"J Pearl","year":"2009","unstructured":"Pearl, J., et al. (2009). Causal inference in statistics: An overview. Statistics Surveys, 3, 96\u2013146.","journal-title":"Statistics Surveys"},{"key":"9666_CR74","doi-asserted-by":"crossref","unstructured":"Pei, K., Cao, Y., Yang, J., & Jana, S. (2017). Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th symposium on operating systems principles (pp. 1\u201318).","DOI":"10.1145\/3132747.3132785"},{"key":"9666_CR75","doi-asserted-by":"crossref","unstructured":"Petsios, T., Tang, A., Stolfo, S., Keromytis, A. D., & Jana, S. (2017). Nezha: Efficient domain-independent differential testing. In 2017 IEEE symposium on security and privacy (SP) (pp. 615\u2013632). IEEE.","DOI":"10.1109\/SP.2017.27"},{"key":"9666_CR77","doi-asserted-by":"crossref","unstructured":"Polyzotis, N., Roy, S., Whang, S. E., & Zinkevich, M. (2017). Data management challenges in production machine learning. In Proceedings of the 2017 ACM international conference on management of data (pp. 1723\u20131726).","DOI":"10.1145\/3035918.3054782"},{"issue":"5","key":"9666_CR78","doi-asserted-by":"publisher","first-page":"909","DOI":"10.3390\/app9050909","volume":"9","author":"S Qiu","year":"2019","unstructured":"Qiu, S., Liu, Q., Zhou, S., & Wu, C. (2019). Review of artificial intelligence adversarial attack and defense technologies. Applied Sciences, 9(5), 909.","journal-title":"Applied Sciences"},{"key":"9666_CR79","unstructured":"Raghunathan, A., Steinhardt, J., & Liang, P. (2018). Certified defenses against adversarial examples. arXiv preprint. arXiv:1801.09344"},{"key":"9666_CR80","doi-asserted-by":"crossref","unstructured":"Reber, M., Krafft, T. D., Krafft, R., Zweig, K. A., & Couturier, A. (2020). Data donations for mapping risk in google search of health queries: A case study of unproven stem cell treatments in SEM. In 2020 IEEE symposium series on computational intelligence (SSCI) (pp. 2985\u20132992). IEEE.","DOI":"10.1109\/SSCI47803.2020.9308420"},{"issue":"2","key":"9666_CR81","doi-asserted-by":"publisher","first-page":"358","DOI":"10.1090\/S0002-9947-1953-0053041-6","volume":"74","author":"HG Rice","year":"1953","unstructured":"Rice, H. G. (1953). Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical Society, 74(2), 358\u2013366.","journal-title":"Transactions of the American Mathematical Society"},{"issue":"5","key":"9666_CR82","doi-asserted-by":"publisher","first-page":"206","DOI":"10.1038\/s42256-019-0048-x","volume":"1","author":"C Rudin","year":"2019","unstructured":"Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206\u2013215.","journal-title":"Nature Machine Intelligence"},{"issue":"12","key":"9666_CR83","doi-asserted-by":"publisher","first-page":"781","DOI":"10.1016\/j.infsof.2003.10.008","volume":"46","author":"AM Salem","year":"2004","unstructured":"Salem, A. M., Rekab, K., & Whittaker, J. A. (2004). Prediction of software failures through logistic regression. Information and Software Technology, 46(12), 781\u2013789.","journal-title":"Information and Software Technology"},{"key":"9666_CR84","unstructured":"Sandvig, C., Hamilton, K., Karahalios, K., & Langbort, C. (2014). Auditing algorithms: Research methods for detecting discrimination on internet platforms. Proceedings of data and discrimination: Converting critical concerns into productive inquiry (Vol. 22, pp. 4349\u20134357)."},{"key":"9666_CR85","doi-asserted-by":"crossref","unstructured":"Saurwein, F., Just, N., & Latzer, M. (2015). Governance of algorithms: Options and limitations. Info, 17(6), 35\u201349.","DOI":"10.1108\/info-05-2015-0025"},{"issue":"9","key":"9666_CR86","doi-asserted-by":"publisher","first-page":"805","DOI":"10.1109\/TSE.2016.2532875","volume":"42","author":"S Segura","year":"2016","unstructured":"Segura, S., Fraser, G., Sanchez, A. B., & Ruiz-Cort\u00e9s, A. (2016). A survey on metamorphic testing. IEEE Transactions on Software Engineering, 42(9), 805\u2013824.","journal-title":"IEEE Transactions on Software Engineering"},{"key":"9666_CR87","volume-title":"A\/B testing: The most powerful way to turn clicks into customers","author":"D Siroker","year":"2013","unstructured":"Siroker, D., & Koomen, P. (2013). A\/B testing: The most powerful way to turn clicks into customers. Wiley."},{"key":"9666_CR88","doi-asserted-by":"crossref","unstructured":"Steineck, G., & Ahlbom, A. (1992). A definition of bias founded on the concept of the study base. Epidemiology, 3, 477\u2013482.","DOI":"10.1097\/00001648-199211000-00003"},{"key":"9666_CR89","unstructured":"Sun, Y., Huang, X., Kroening, D., Sharp, J., Hill, M., & Ashmore, R. (2018). Testing deep neural networks. arXiv preprint arXiv:1803.04792."},{"key":"9666_CR90","doi-asserted-by":"crossref","unstructured":"Taskesen, B., Blanchet, J., Kuhn, D., & Nguyen, V. A. (2021). A statistical test for probabilistic fairness. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 648\u2013665).","DOI":"10.1145\/3442188.3445927"},{"issue":"3","key":"9666_CR91","doi-asserted-by":"publisher","first-page":"632","DOI":"10.1525\/aa.1998.100.3.632","volume":"100","author":"AR Templeton","year":"1998","unstructured":"Templeton, A. R. (1998). Human races: a genetic and evolutionary perspective. American Anthropologist, 100(3), 632\u2013650.","journal-title":"American Anthropologist"},{"key":"9666_CR92","doi-asserted-by":"crossref","unstructured":"Tramer, F., Atlidakis, V., Geambasu, R., Hsu, D., Hubaux, J.-P., Humbert, M., Juels, A., & Lin, H. (2017). Fairtest: Discovering unwarranted associations in data-driven applications. In 2017 IEEE European symposium on security and privacy (EuroS&P) (pp. 401\u2013416). IEEE.","DOI":"10.1109\/EuroSP.2017.29"},{"key":"9666_CR93","doi-asserted-by":"crossref","unstructured":"Udeshi, S., Arora, P., & Chattopadhyay, S. (2018). Automated directed fairness testing. In Proceedings of the 33rd ACM\/IEEE international conference on automated software engineering (pp. 98\u2013108).","DOI":"10.1145\/3238147.3238165"},{"key":"9666_CR94","doi-asserted-by":"crossref","unstructured":"Vouk, M. A. (1988). On back-to-back testing. In Computer assurance, 1988. COMPASS\u201988 (pp. 84\u201391). IEEE.","DOI":"10.1109\/CMPASS.1988.9641"},{"key":"9666_CR95","doi-asserted-by":"publisher","DOI":"10.1017\/9781108690935","volume-title":"Machine learning refined: Foundations, algorithms, and applications","author":"J Watt","year":"2020","unstructured":"Watt, J., Borhani, R., & Katsaggelos, A. K. (2020). Machine learning refined: Foundations, algorithms, and applications. Cambridge University Press."},{"key":"9666_CR96","doi-asserted-by":"crossref","unstructured":"Wenzelburger, G., & Hartmann, K. (2021). Policy formation, termination and the multiple streams framework: The case of introducing and abolishing automated university admission in France. Policy Studies, 43(5), 1075\u20131095.","DOI":"10.1080\/01442872.2021.1922661"},{"key":"9666_CR97","doi-asserted-by":"crossref","unstructured":"Wu, Y., Zhang, L., & Wu, X. (2019). Counterfactual fairness: Unidentification, bound and algorithm. In Proceedings of the 28th international joint conference on artificial intelligence.","DOI":"10.24963\/ijcai.2019\/199"},{"key":"9666_CR98","doi-asserted-by":"crossref","unstructured":"Xu, Y., Chen, N., Fernandez, A., Sinno, O. & Bhasin, A. (2015). From infrastructure to culture: A\/b testing challenges in large scale social networks. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 2227\u20132236).","DOI":"10.1145\/2783258.2788602"},{"key":"9666_CR99","doi-asserted-by":"crossref","unstructured":"Young, S. W. (2014). Improving library user experience with A\/B testing: Principles and process. Weave: Journal of Library User Experience, 1(1) 75.","DOI":"10.3998\/weave.12535642.0001.101"},{"key":"9666_CR100","doi-asserted-by":"crossref","unstructured":"Zhu, H. (2015). Jfuzz: A tool for automated java unit testing based on data mutation and metamorphic testing methods. In 2015 2nd International conference on trustworthy systems and their applications (pp. 8\u201315). IEEE.","DOI":"10.1109\/TSA.2015.13"},{"issue":"2","key":"9666_CR101","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1007\/s10506-016-9182-5","volume":"24","author":"I \u017dliobait\u0117","year":"2016","unstructured":"\u017dliobait\u0117, I., & Custers, B. (2016). Using sensitive personal data may be necessary for avoiding discrimination in data-driven decision models. Artificial Intelligence and Law, 24(2), 183\u2013201.","journal-title":"Artificial Intelligence and Law"},{"key":"9666_CR102","doi-asserted-by":"crossref","unstructured":"Zweig, K. A., Krafft, T. D., Klingel, A., & Park, E. (2021). Sozioinformatik: Ein Neuer Blick Auf Informatik und Gesellschaft. Carl Hanser Verlag GmbH Co KG.","DOI":"10.3139\/9783446468030.fm"}],"container-title":["Minds and Machines"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11023-024-09666-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11023-024-09666-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11023-024-09666-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,4]],"date-time":"2024-06-04T12:02:25Z","timestamp":1717502545000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11023-024-09666-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,25]]},"references-count":99,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,6]]}},"alternative-id":["9666"],"URL":"https:\/\/doi.org\/10.1007\/s11023-024-09666-0","relation":{},"ISSN":["1572-8641"],"issn-type":[{"value":"1572-8641","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,25]]},"assertion":[{"value":"1 December 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 February 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 May 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"15"}}