A survey on artificial intelligence assurance

Batarseh, Feras A.; Freeman, Laura; Huang, Chih-Hao

doi:10.1186/s40537-021-00445-7

Survey Paper
Open access
Published: 26 April 2021

A survey on artificial intelligence assurance

Journal of Big Data volume 8, Article number: 60 (2021) Cite this article

19k Accesses
66 Citations
4 Altmetric
Metrics details

Abstract

Artificial Intelligence (AI) algorithms are increasingly providing decision making and operational support across multiple domains. AI includes a wide (and growing) library of algorithms that could be applied for different problems. One important notion for the adoption of AI algorithms into operational decision processes is the concept of assurance. The literature on assurance, unfortunately, conceals its outcomes within a tangled landscape of conflicting approaches, driven by contradicting motivations, assumptions, and intuitions. Accordingly, albeit a rising and novel area, this manuscript provides a systematic review of research works that are relevant to AI assurance, between years 1985 and 2021, and aims to provide a structured alternative to the landscape. A new AI assurance definition is adopted and presented, and assurance methods are contrasted and tabulated. Additionally, a ten-metric scoring system is developed and introduced to evaluate and compare existing methods. Lastly, in this manuscript, we provide foundational insights, discussions, future directions, a roadmap, and applicable recommendations for the development and deployment of AI assurance.

Introduction and survey structure

The recent rise of big data gave birth to a new promise for AI based in statistical learning, and at this time, contrary to previous AI winters, it seems that statistical learning enabled AI has survived the hype, in that it has been able to surpass human-level performance in certain domains. Similar to any other engineering deployment, building AI systems requires evaluation, which may be called assurance, validation, verification or another name. We address this terminology debate in the next section.

Defining the scope of AI assurance is worth studying, AI is currently deployed at multiple domains, it is forecasting revenue, guiding robots in the battlefield, driving cars, recommending policies to government officials, predicting pregnancies, and classifying customers. AI has multiple subareas such as machine learning, computer vision, knowledge-based systems, and many more—therefore, we pose the question: is it possible to provide a generic assurance solution across all subareas and domains? This review sheds light on existing works in AI assurance, provides a comprehensive overview of the state-of-the-science, and discusses patterns in AI assurance publishing. This section sets that stage for the manuscript by presenting the motivation, clear definitions and distinctions, as well as the inclusion/exclusion criteria of reviewed articles.

Relevant terminology and definitions

All AI systems require assurance; it is important to distinguish between different terms that might have been used interchangeably in literature. We acknowledge the following relevant terms: (1) validation, (2) verification, (3) testing, and (4) assurance. This paper is concerned with all of the mentioned terms. The following definitions are adopted in our manuscript, for the purposes of clarity and to avoid ambiguity in upcoming theoretical discussions:

Verification: “The process of evaluating a system or component to determine whether the products of a given development phase satisfy the conditions imposed at the start of that phase”. Validation: “The process of evaluating a system or component during or at the end of the development process to determine whether it satisfies specified requirements” (Gonzalez and Barr, 2020). Another definition for V&V is from the Department of Defense, as they applied testing practices to simulation systems, it states the following: Verification is the “process of determining that a model implementation accurately represents the developer’s conceptual descriptions and specifications”, and Validation is the process of “determining the degree to which a model is an accurate representation” [60].

Testing: according to the American Software testing Qualification Board, testing is “the process consisting of all lifecycle activities, both static and dynamic, concerned with planning, preparation and evaluation of software products and related work products to determine that they satisfy specified requirements, to demonstrate that they are fit for purpose and to detect defects”. Based on that (and other reviewed definitions), testing includes both validation and verification.

Assurance: this term has been rarely applied to conventional software engineering; rather, it is used in the context of AI and learning algorithms. In this manuscript, based on prior definitions and recent AI challenges, we propose the following definition for AI assurance:

A process that is applied at all stages of the AI engineering lifecycle ensuring that any intelligent system is producing outcomes that are valid, verified, data-driven, trustworthy and explainable to a layman, ethical in the context of its deployment, unbiased in its learning, and fair to its users.

Our definition is by design generic and therefore applicable to all AI domains and subareas. Additionally, based on our review of a wide variety of existing definitions of assurance, it is evident that the two main AI components of interest are the data and the algorithm; accordingly, those are the two main pillars of our definition. Additionally, we highlight that the outcomes the AI enable system (intelligent system) are evaluated at the system level, where the decision or action is being taken.

The remaining of this paper is focused on a review of existing AI assurance methods, and it is structured as follows: the next section presents the inclusion/exclusion criteria, "AI assurance landscape" section provides a historical perspective as well as the entire assurance landscape, "The review and scoring of methods" section includes an exhaustive list of papers relevant to AI assurance (as well as the scoring system), "Recommendations and the future of AI assurance" section presents overall insights and discussions of the survey, and lastly, "Conclusions" section presents conclusions.

Description of included articles

Articles that are included in this paper were found using the following search terms: assurance, validation, verification, and testing. Additionally, as it is well known, AI has many subareas, in this paper, the following subareas were included in the search: machine learning, data science, deep learning, reinforcement learning, genetic algorithms, agent-based systems, computer vision, natural language processing, and knowledge-based systems (expert systems). We looked for papers in conference proceedings, journals, books and book chapters, dissertations, as well as industry white papers. The search yielded results from year 1985 to year 2021. Besides university libraries, multiple online repositories were searched (the most commonplace AI peer-reviewed venues). Additionally, areas of research such as data bias, data incompleteness, Fair AI, Explainable AI (XAI), and Ethical AI were used to widen the net of search. The next section presents an executive summary of the history of AI assurance.

AI assurance landscape

The history and current state of AI assurance is certainly a debatable matter. In this section, multiple methods are discussed, critiqued, and aggregated by AI subarea. The goal is to illuminate the need for an organized system for evaluating and presenting assurance methods; which is presented in next sections of this manuscript.

A historical perspective (analysis of the state-of-the-science)

As a starting point for AI assurance and testing, there is nowhere more suitable to begin than the Turing test [219]. In his famous manuscript: Computing Machinery and Intelligence, he introduced the imitation game, which was then popularized as the Turing test. Turing states: “The object of the game for the interrogator is to determine which of the other two is the man and which is the woman”. Based on a series of questions, the intelligent agent “learns” how to make such a distinction. If we consider the different types of intelligence, it becomes evident that different paradigms have different expectations. A genetic algorithm aims to optimize, while a classification algorithm aims to classify (choose between yes and no for instance). As Turing stated in his paper: “We are of course supposing for the present that the questions are of the kind to which an answer: Yes or No is appropriate, rather than questions such as: What do you think of Picasso?” Comparing predictions (or classifications) to actual outputs is one way of evaluating that the results of an algorithm match what the real world created.

There were a dominating number of validation and verification methods in the seventies, eighties, and nineties for two forms of intelligence, knowledge-based systems (i.e., expert systems) and simulation systems (majorly for defense and military applications). One of the first times where AI turned towards data-driven methods was apparent in 1996 at the Third International Math and Science Study (TIMSS), which, focused on quality assurance in data collection (Martin and Mullis, 1996). Data from Forty-five countries were included in the analysis. In a very deliberate process, the data collectors were faced with challenges relevant to the internationalization of data. For example, data from Indonesia had errors in translation; data collection processes were different in Korea, Germany, and Kuwait than the standard process due to funding and timing issues. Such real-world issues in data collection certainly pose a challenge to the assurance of statistical learning AI that require addressing.

In the 1990s, AI testing and assurance were majorly inspired by the big research archive of testing of software (i.e., within software engineering) [23]. However, a slim amount of literature explored algorithms such as genetic algorithms [104], reinforcement learning (Hailu and Sommer, 1997), and neural networks (Paladini, 1999). It was not until the 2000s that there was a serious surge in data-driven assurance and the testing of AI methods.

In the early 2000s, mostly manual methods of assurance were developed, for example, CommonKADS was a popular and commonplace method that was used to incrementally develop and test an intelligent system. Other domain-specific works were published in areas such as healthcare [27], or algorithms-specific assurance such as Crisp Clustering for k-means clustering [85].

It was not until the 2010s that a spike in AI assurance for big data occurred. Validation of data analytics and other new areas, such as XAI and Trustworthy AI have dominated the AI assurance field in recent years. Figure 1 illustrates that areas including XAI, computer vision, deep learning, and reinforcement learning have had a recent spike in assurance methods; and the trend is expected to be increasingly on the rise (as shown in Fig. 2). The figure also illustrates that knowledge-based systems were the focus until the early nineties, and shows a shift towards the statistical learning based subareas in the 2010s. A version of the dashboard is available in a public repository (with instructions on how to run it): https://github.com/ferasbatarseh/AI-Assurance-Review.

The p-values for the trend lines presented in Fig. 2 are as follows: Data Science (DS): 0.87, Genetic Algorithms (GA): 0.50, Reinforcement Learning (RL): 0.15, Knowledge-Based Systems (KBS): 0.97, Computer Vision (CV): 0.22, Natural Language Processing (NLP): 0.17, Generic AI: 0.95, Agent-Based Systems (ABS): 0.33, Machine Learning (ML): 0.72, Deep Learning (DL): 0.37, and XAI: 0.44.

It is undeniable that there is a rise in the research of AI, and especially in the area of assurance. The next section "The state of AI assurance" provides further details on the state-of-the-art, and "The review and scoring of methods" section presents an exhaustive review of all AI assurance methods found under the predefined search criteria.

The state of AI assurance

This section introduces some milestone methods and discussion in AI assurance. Many of the discussed works rely on standard software validation and verification methods. Such methods are inadequate for AI systems, because they have a dimension of intelligence, learning, and re-learning, as well as adaptability to certain contexts. Therefore, errors in AI system “may manifest themselves because of autonomous changes” [211], and among other scenarios would require extensive assurance. For instance, in expert systems, the inference engine component creates rules and new logic based on forward and backward propagation [20]. Such processes require extensive assurance of the process as well as the outcome rules. Alternatively, for other AI areas such as neural networks, while propagation is used, taxonomic evaluations and adversarial targeting are more critical to their assurance [145]. For other subareas such as machine learning, the structure of data, data collection decisions, and other data-relevant properties need step-wise assurance to evaluate the resulted predictions and forecasts. For instance, several types of bias can occur in any phase of the data science lifecycle or while extracting outcomes. Bias can begin during data collection, data wrangling, modeling, or any other phase. Biases and variances which arise in the data are independent of the sample size or statistical significance, and they can directly affect the context or the results or the model. Other issues such as incompleteness, data skewness, or lack of structure have a negative influence on the quality of outcomes of any AI model and require data assurance [117].

While the historic majority of methods for knowledge-based systems and expert systems (as well as neural networks) aimed at finding generic solutions for their assurance [21, 218], and [166], other “more recent” methods were focused on one AI subarea and one domain. For instance, in Mason et al. [142, 144], assurance was applied to reinforcement learning methods for safety–critical systems. Prentzas et al. [174] presented an assurance method for machine learning as its applied to stroke predictions, similar to Pawar’s et al.’s [167] XAI for healthcare framework. Pepe et al. [169], and Chittajallu et al.’s [42] developed a method for surgery video detection methods. Moreover, domains such as law and society would generally benefit from AI subareas such as natural language processing for analyzing legal contracts [135], but also require assurance.

Another major aspect (for most domains) that was evident in the papers reviewed was the need for explainability (i.e. XAI) of the learning algorithm, defined as: to identify how the outcomes were arrived at (transforming the black-box to a white-box) [193]. Few papers without substantial formal methods were found for Fair AI, Safe AI [68], Transparent AI [1], or Trustworthy AI [6], but XAI [83] has been central (as the previous figures in this paper also suggest). For instance, in Lee et al. [121], layer-wise relevance propagation was introduced to obtain the effects of every neural layer and each neuron on the outcome of the algorithm. Those observations are then presented for better understanding of the model and its inner workings. Additionally, Arrieta et al. [16] presented a model for XAI that is tailored for road traffic forecasting, and Guo [82] presented the same, albeit for 5G and wireless networks [200]. Similarly, Kuppa and Le-Khac [118] presented a method focused on Cyber Security using gradient maps and bots. Go and Lee [76] presented an AI assurance method for trustworthiness of security systems. Lastly, Guo [82] developed a framework for 6G testing using deep neural networks.

Multi-agent AI is another aspect that requires a specific kind of assurance, by validating every agent, and verifying the integration of agents [163]. The challenges of AI algorithms and their assurance is evident and consistent across many of the manuscripts, such as in Janssen and Kuk’s [100] study of the limitations of AI for government, on the other hand, Batarseh et al. [22] presented multiple methods for applying data science at government (with assurance using knowledge-based systems). Assurance is especially difficult when it comes to being performed in real time, timeliness in critical systems, and other defense-relevant environments is very important [54, 105, 124], and (Laat, 2017). Other less “time-constrained” activities such as decisions at organizations [186] and time series decision support systems could utilize slower methods such as genetic algorithms [214], but they require a different take on assurance. The authors suggested that “by no means we have a definitive answer, what we do here is intended to be suggestive” [214] when addressing the validation part of their work. A recent publication by Raji et al. [180] shows a study from the Google team claiming that they are “aiming to close the accountability gap of AI” using an internal audit system (at Google). IBM research also proposed few solutions to manage the bias of AI services [202, 225]. As expected, the relevance and success of assurance methods varied, and so we developed a scoring system to evaluate existing methods. We were able to identify 200 + relevant manuscripts with methods. The next section presents the exhaustive list of the works presented in this section in addition to multiple others with our derived scores.

The review and scoring of assurance methods

The scoring of each AI assurance method/paper was based on the sum of the score of ten metrics. The objective of the metrics is to provide readers with a meaningful strategy for sorting through the vast literature on AI assurance. The scoring metric is based on the authors’ review of what makes a useful reference paper for AI assurance. Each elemental metric is allocated one point, and each method is either given that point or not (0 or 1), as follows:

I.
Specificity to AI: some assurance methods are generically tailored to many systems, others are deployable only to intelligent systems; one point was assigned to methods that focused (i.e. specific) on the inner workings of AI systems.
II.
The existence of a formal method: this metric indicates whether the manuscript under review presented a formal (quantitative and qualitative) description of their method (1 point) or not (0 points).
III.
Declared successful results: in experimental work of a method under review, some authors declared success and presented success rates, if that is present, we gave that method a point.
IV.
Datasets provided: whether the method has a big dataset associated with it for testing (1) or not (0). This is an important factor for reproducibility and research evaluation purposes.
V.
AI system size: methods were applied to a small AI system, other were applied to bigger systems for instance, we gave a point to methods that could be applied to big real-world systems rather than ones with theoretical deployments.
VI.
Declared success: whether the authors declared success of their method in reaching an assured AI system (1) or not (0).
VII.
Mentioned limitations: whether there are obvious method limitations (0) or not (1).
VIII.
Generalized to other AI deployments: some methods are broad and are able to be generalized for multiple AI systems (1), others are “narrow” (0) and more specific to one application or one system.
IX.
A real-world application: if the method presented is applied to a real-world application, it is granted one point.
X.
Contrasted with other methods: if the method reviewed is compared, contrasted, or measured against other methods, or if it proves its superiority over other methods, then it is granted a point.

Table 1 presents the methods reviewed, along with their first author’s last name, publishing venue, AI subarea, as well as the score (sum of ten metrics).

Table 1 Reviewed methods and their total scores

Full size table

Other aspects such as domain of application were missing from many papers and inconsistent, therefore, we didn’t include them in the table. Additionally, we considered citations per paper. However, the data on citations (for a 250+ papers study) were incomplete and difficult to find in many cases. For many of the papers, we did not have information on how many times they were cited, because many publishers failed to index their papers across consistent venues (e.g., Scopus, MedLine, Web of Science, and others). Additionally, the issue of self-citation is in some cases considered in scoring but in other cases is not. Due to these citation inconsistencies (which are believed to be a challenge that reaches all areas of science), we deemed that using citations would provide more questions than answers than our subject matter expert based metrics.

Appendix presents a list of all reviewed manuscripts and their detailed scores (for the ten metrics) by ranking category; ten columns matching the presented scoring method, as follows: AI subarea: AI.s; Relevance: R; Method: M; Results: Rs; Dataset: Ds; Size: Sz; Success: Sc; Limitations: L; General: G; Application: A; and Comparison: C. The papers, data, dashboard, and lists are on a public GitHub repository: https://github.com/ferasbatarseh/AI-Assurance-Review.

In 2018, AI papers accounted for 3% of all peer reviewed papers published worldwide [181]. The share of AI papers has grown three-fold over twenty years. Moreover, between 2010 and 2019, the total number of AI papers on arXiv increased over 20-fold [181]. As of 2019, machine learning papers have increased most dramatically, followed by computer vision and pattern recognition. While machine learning was the most active research areas in AI, its subarea, DL have become increasing popularly in the past few years. According to GitHub, TensorFlow is the most popular free and open-source software library for AI. TensorFlow is a corporate-backed research framework, and it has been shown that, in recent years, there’s noticeable trend of the emergence of such corporate-backed research frameworks. Since 2005, attendances at large AI conferences have grown significantly, NeurIPS and ICML (being the two fastest growing conferences have over eight-fold increase. Attendances at small AI conferences have also grown over 15-fold starting from 2014, and the increase is highly related to the emergence of deep and reinforcement learning [181]. As the field of AI continues to grow, assurance of AI has become a more important and timely topic.

Recommendations and the future of AI assurance

The need for AI assurance

The emergence of complex, opaque, and invisible algorithms that learn from data motivated a variety of investigations, including: algorithm awareness, clarity, variance, and bias [94]. Algorithmic bias for instance, whether it occurs in an unintentional or intentional manner, is found to severely limit the performance of an AI model. Given AI systems provide recommendations based on data, users’ faith in that the recommended outcomes are trustworthy, fair, and not biased is another critical challenge for AI assurance.

Applications of AI such as facial recognition using deep learning have become commonplace. Deep learning models are often exposed to adversarial inputs (such as deep-fakes), thus limiting their adoption and increasing their threat [145]. Unlike conventional software, aspects such as explainability (unveiling the blackbox of AI models) dictate how assurance is performed and what is needed to accomplish it. Unfortunately however, similar to the software engineering community’s experience with testing, ensuring a valid and verified system is often an afterthought. Some of the classical engineering approaches would prove useful to the AI assurance community, for instance, performing testing in an incremental manner, involving users, and allocating time and budget specifically to testing, are some main lessons that ought to be considered. A worthy recent trend that might aid majorly in assurance is using AI for testing AI (i.e., deploying intelligence methods for the testing and assurance of AI methods). Additionally, from a user’s perspective, recent growing questions in research that are relevant to assurance pose the following concerns: how is learning performed inside the blackbox? How is the algorithm creating its outcomes? Which dependent variables are the most influential? Is the AI algorithm dependable, safe, secure, and ethical? Besides all the previously mentioned assurance aspects, we deem the following foundational concepts as highly connected, worthy of considering by developers and AI engineers, and essential to all forms of AI assurance: (1) Context: refers to the scope of the system, which could be associated with a timeframe, a geographical area, specific set of users, and any other system environmental specifications (2) Correlation: the amount of relevance between the variables, this is usually part of exploratory analysis, however, it is key to understand which dependent variables are correlated and which ones are not, (3) Causation: the study of cause and effect; i.e., which variables directly cause the outcome to change (increase or decrease) in any fashion, (4) Distribution: whether a normal distribution is assumed or not. Data distribution of the inputted dependent variables can dictate which models are best suited for the problem at hand, and (5) Attribution: aims at allocating the variables in the dataset that have the strongest influence on the outcomes of the AI algorithm.

Providing a scoring system to evaluate existing methods provides support to scholars in evaluating the field, avoiding future mistakes, and creating a system where AI scientific methods are measured and evaluated by others, a practice that is becoming increasingly rare in scientific arenas. More importantly, practitioners –in most cases– find it difficult to identify the best method for assurance relevant to their domain and subarea. We anticipate that this comprehensive review will help in that regard as well. As part of AI assurance, ethical outcomes should be evaluated, while ethical considerations might differ from one context to another, it is evident that requiring outcomes to be ethical, fair, secure, and safe necessitates the involvement of humans, and in most cases, experts from other domains. That notion qualifies AI assurance as a multidisciplinary area of investigation.

Future components of AI assurance research

In some AI subareas, there are known issues to be tackled by AI assurance, such as deep learning’s sensitivity to adversarial attacks, as well as overfitting and underfitting issues in machine learning. Based on that and on the papers reviewed in this survey, it is evident that AI assurance is a necessary pursuit, but a difficult and multi-faceted area to address. However, previous experiences, successes, and failures can point us to what would work well and what is worth pursuing. Accordingly, we suggest performing and developing AI assurance by (1) domain, by (2) AI sub area, and by (3) AI goal; as a theoretical roadmap, similar to what is shown in Fig. 3.

In some cases, such as in unsupervised learning techniques, it is difficult to know what to validate or assure [86]. In such cases, the outcome is not predefined (contrary to supervised learning). Genetic algorithms and reinforcement learning have the same issue, and so in such cases, feature selection, data bias, and other data-relevant validation measures, as well as hypothesis generation and testing become more important. Additionally, different domains require different tradeoffs; trustworthiness for instance is more important when it comes to using AI in healthcare versus when its being used for revenue estimates at a private sector firm; also, AI safety is more critical in defense systems than in systems built for education or energy application.

Other surveys presented a review of AI validation and verification [71] and [21], however, none was found that covered the three dimensional structure presented (by subarea, goal, and domain) like this review.

Conclusions

In AI assurance, there are other philosophical questions that are also very relevant, such as what is a valid system? What is a trustworthy outcome? When to stop testing or model learning? When to claim victory on AI safety? When to allow human intervention (and when not to)? And many other similar questions that require close attention and evaluation by the research community. The most successful methods presented in literature (scored as 8, 9, or 10), are the ones that were specific to an AI subarea and goal; additionally, ones that had done extensive theoretical and hands-on experimentation. Accordingly, we propose the following five considerations as they were evident in existing successful works when defining or applying new AI assurance methods: (1) Data quality: similar to assuring the outcomes, assuring the dataset and its quality mitigates issues that would eventually prevail in the AI algorithm. (2) Specificity: as this review concluded, the assurance methods ought to be designed to one goal and subarea of AI. (3) Addressing invisible issues: AI engineers should carry out assurance in a procedural manner, not as an afterthought or a process that is performed only in cases of the presence of visible issues. (4) Automated assurance: using manual methods for assurance would in many cases defeat the purpose. It is difficult to evaluate the validity of the assurance method itself, hence, automating the assurance process can—if done with best practices in mind– minimize error rates due to human interference. (5) The user: involving the user in an incremental manner is critical in expert-relevant (non-engineering) domains such as healthcare, education, economics, and other areas. Explainability is a relative and subjective matter; hence, users of the AI system can help in defining how explainability ought to be presented.

Based on all discussions presented, we assert it will be beneficial to have multi-disciplinary collaborations in the field of AI assurance. The growth of the field might need not only computer scientists and engineers to develop advanced algorithms, but also economists, physicians, biologists, lawyers, cognitive scientists, and other domain experts to unveil AI deployments to their domains, create a data-driven culture within their organizations, and ultimately enable the wide-scale adoption of assured AI systems.

Availability of data and materials

All data and materials are available under the following link: https://github.com/ferasbatarseh/AI-Assurance-Review.

Abbreviations

AI:: Artificial Intelligence
TIMSS:: Third International Math and Science Study
DS:: Data Science
GA:: Genetic Algorithms
RL:: Reinforcement Learning
KBS:: Knowledge-Based Systems
CV:: Computer Vision
NLP:: Natural Language Processing
ABS:: Agent-Based Systems
ML:: Machine Learning
DL:: Deep Learning
XAI:: Explainable AI

References

Abdollahi B, Nasraoui O. Transparency in fair machine learning: the case of explainable recommender systems. In: Zhou J, Chen F, editors. Human and machine learning: visible, explainable, trustworthy and transparent. Berlin: Springer; 2018 https://doi.org/10.1007/978-3-319-90403-0_2.
Chapter Google Scholar
Abel T, Gonzalez A (1997). Utilizing Criteria to Reduce a Set of Test Cases for Expert System Validation.
Abel T, Knauf R, Gonzalez A. (1996). Generation of a minimal set of test cases that is functionally equivalent to an exhaustive set, for use in knowledge-based system validation.
Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access. 2018;6:23.
Article Google Scholar
Agarwal A., Lohia P, Nagar, S, Dey K, Saha D. (2018). Automated Test Generation to Detect Individual Discrimination in AI Models ArXiv:1809.03260 [Cs].
Aitken M. Assured human-autonomy interaction through machine self-confidence. Colorado: University of Colorado; 2016.
Google Scholar
Algorithmic Accountability Policy Tooklit. (2018). AI NOW.
Ali AL, Schmid F. Data quality assurance for volunteered geographic information. In: Duckham M, Pebesma E, Stewart K, Frank AU, editors. Geographic information science. Berlin: Springer; 2014. p. 126–41.
Google Scholar
Alves E, Bhatt D, Hall B, Driscoll K, Murugesan A (2018). Considerations in Assuring Safety of Increasingly Autonomous Systems (NASA Contractor Report NASA/CR–2018–22008; Issue NASA/CR–2018–22008). NASA.
Amodei D, Olah C, Steinhardt J, Christiano P, Schulman J, Mané D. Concrete Problems in AI Safety. ArXiv:1606.06565[Cs].; 2016.
Anderson A, Dodge J, Sadarangani A, Juozapaitis Z, Newman E, Irvine J, Chattopadhyay S, Olson M, Fern A, Burnett M. Mental models of mere mortals with explanations of reinforcement learning. ACM Trans Interact Intell Syst. 2020;10(2):1–37. https://doi.org/10.1145/3366485.
Article Google Scholar
Andert EP. Integrated knowledge-based system design and validation for solving problems in uncertain environments. Int J Man Mach Stud. 1992;36(2):357–73. https://doi.org/10.1016/0020-7373(92)90023-E.
Article Google Scholar
Antoniou G, Harmelen F, Plant R, Vanthienen J. Verification and validation of knowledge-based systems: report on two 1997 events. AI Mag. 1998;19:123–6.
Google Scholar
Antunes N, Balby L, Figueiredo F, Lourenco N, Meira W, Santos W (2018). Fairness and Transparency of Machine Learning for Trustworthy Cloud Services. 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), 188–193. https://doi.org/10.1109/DSN-W.2018.00063
Arifin SMN, Madey GR. Verification, validation, and replication methods for agent-based modeling and simulation: lessons learned the hard way! In: Yilmaz L, editor. Concepts and methodologies for modeling and simulation: a tribute to Tuncer Ören. Berlin: Springer; 2015. p. 217–42.
Google Scholar
Arrieta AB, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, García S, Gil-López S, Molina D, Benjamins R, Chatila R, Herrera F. (2019). Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI.ArXiv:1910.10045 [Cs].
Assurance in the age of AI. (2018). EY.
Barr VB, Klavans JL. Verification and validation of language processing systems: is it evaluation? Proc Workshop Eval Lang Dialogue Syst. 2001;9:1–7. https://doi.org/10.3115/1118053.1118058.
Article Google Scholar
Barredo-Arrieta A, Lana I, Del Ser J. What lies beneath: a note on the explainability of black-box machine learning models for road traffic forecasting. IEEE Intell Transp Syst Conf (ITSC). 2019;2019:2232–7. https://doi.org/10.1109/ITSC.2019.8916985.
Article Google Scholar
Batarseh FA, Gonzalez AJ. Incremental lifecycle validation of knowledge-based systems through commonKADS. EEE Trans Syst Man Cybern. 2013;43(3):12.
Google Scholar
Batarseh FA, Gonzalez AJ. Validation of knowledge-based systems: a reassessment of the field. Artif Intell Rev. 2015;43(4):485–500. https://doi.org/10.1007/s10462-013-9396-9.
Article Google Scholar
Batarseh AF, Yang R. Transforming Government and Agricultural Policy Using Artificial Intelligence: Federal Data Science; 2017.
Batarseh A, Feras, Mohod R, Kumar, A, and Bui J. Chapter 10: the Application of Artificial Intelligence in Software Engineering: a Review Challenging Conventional Wisdom. (2020). In Data Democracy, Elsevier Academic Press. pp. 179–232
Batarseh F. A, Kulkarni A. (2019). Context-Driven Data Mining through Bias Removal and Incompleteness Mitigation. 7.
Becker L. A, Green P. G, Bhatnagar J. (1989). Evidence Flow Graph Methods for Validation and Verification of Expert Systems (NASA Contractor Report No. 181810; p. 46). Worcester Polytechnic Institute.
Bellamy RKE, Mojsilovic A, Nagar S, Ramamurthy KN, Richards J, Saha D, Sattigeri P, Singh M, Varshney KR, Zhang Y, Dey K, Hind M, Hoffman SC, Houde S, Kannan K, Lohia P, Martino J, Mehta S. AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM J Res Dev. 2019;63(4):1–4. https://doi.org/10.1147/JRD.2019.2942287.
Article Google Scholar
Berndt DJ, Fisher JW, Hevner AR, Studnicki J. Healthcare data warehousing and quality assurance. Computer. 2001;34(12):56–65. https://doi.org/10.1109/2.970578.
Article Google Scholar
Beyret B, Shafti A, Faisal AA. Dot-to-Dot: explainable hierarchical reinforcement learning for robotic manipulation. IEEE/RSJ Int Conf Intell Robots Syst (IROS). 2019;2019:5014–9. https://doi.org/10.1109/IROS40897.2019.8968488.
Article Google Scholar
Birkenbihl C. Differences in cohort study data affect external validation of artificial intelligence models for predictive diagnostics of dementia—Lessons for translation into clinical practice. EPMA J. 2020;11(3):367–76.
Article Google Scholar
Bone C, Dragićević S. Simulation and validation of a reinforcement learning agent-based model for multi-stakeholder forest management. Comput Environ Urban Syst. 2010;34(2):162–74. https://doi.org/10.1016/j.compenvurbsys.2009.10.001.
Article Google Scholar
Brancovici, G. (2007). Towards Trustworthy Intelligence on the Road: A Flexible Architecture for Safe, Adaptive, Autonomous Applications. 2007 IEEE Congress on Evolutionary Computation, Singapore. https://doi.org/10.1109/CEC.2007.4425023
Breck E, Zinkevich M, Polyzotis N, Whang S, Roy S. (2019). Data Validation for Machine Learning. Proceedings of SysML. https://mlsys.org/Conferences/2019/doc/2019/167.pdf
Brennen, A. (2020). What Do People Really Want When They Say They Want “Explainable AI?” We Asked 60 Stakeholders. Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, 1–7. https://doi.org/10.1145/3334480.3383047
Bride H, Dong J. S, Hóu Z, Mahony B, Oxenham M. (2018). Towards Trustworthy AI for Autonomous Systems. In J. Sun M. Sun (Eds.), Formal Methods and Software Engineering (pp. 407–411). Springer International Publishing. https://doi.org/10.1007/978-3-030-02450-5_24
Cao N, Li G, Zhu P, Sun Q, Wang Y, Li J, Yan M, Zhao Y. Handling the adversarial attacks. J Ambient Intell Humaniz Comput. 2019;10(8):2929–43. https://doi.org/10.1007/s12652-018-0714-6.
Article Google Scholar
Carley K. M. (1996). Validating Computational Models [Work Paper]. Carnegie Mellon University.
Castore G. (1987). A Formal Approach to Validation and Verification for Knowledge-Based Control. Systems. 6.
Celis L. E, Deshpande A, Kathuria T, Vishnoi N. K. (2016). How to be Fair and Diverse? ArXiv:1610.07183 [Cs].
Checco A, Bates J, Demartini G. Adversarial attacks on crowdsourcing quality control. J Artif Intell Res. 2020;67:375–408. https://doi.org/10.1613/jair.1.11332.
Article MathSciNet Google Scholar
Chen H-Y, Lee C-H. Vibration signals analysis by explainable artificial intelligence (XAI) approach: application on bearing faults diagnosis. IEEE Access. 2020;8:134246–56. https://doi.org/10.1109/ACCESS.2020.3006491.
Article Google Scholar
Chen T, Liu J, Xiang Y, Niu W, Tong E, Han Z. Adversarial attack and defense in reinforcement learning-from AI security view. Cybersecurity. 2019;2(1):11. https://doi.org/10.1186/s42400-019-0027-x.
Article Google Scholar
Chittajallu, D. R, Dong B, Tunison P, Collins R, Wells K, Fleshman J, Sankaranarayanan G, Schwaitzberg S, Cavuoto L, Enquobahrie A. (2019). XAI-CBIR: Explainable AI System for Content based Retrieval of Video Frames from Minimally Invasive Surgery Videos. 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), 66–69. https://doi.org/10.1109/ISBI.2019.8759428
Cluzeau J. M, Henriquel X, Rebender G, Soudain G, Dijk L. van, Gronskiy A, Haber D, Perret-Gentil C, Polak R. (2020). Concepts of Design Assurance for Neural Networks (CoDANN) [Public Report Extract]. European Union Aviation Safety Agency.
Coenen F, Bench-Capon T, Boswell R, Dibie-Barthélemy J, Eaglestone B, Gerrits R, Grégoire E, Lige¸za, A, Laita, L, Owoc, M, Sellini, F, Spreeuwenberg, S, Vanthienen, J, Vermesan, A, Wiratunga, N. . Validation and verification of knowledge-based systems: report on EUROVAV99. Knowl Eng Rev. 2000;15(2):187–96. https://doi.org/10.1017/S0269888900002010.
Article Google Scholar
Cohen KB, Hunter LE, Palmer M. Assessment of software testing and quality assurance in natural language processing applications and a linguistically inspired approach to improving it. In: Moschitti A, Plank B, editors. Trustworthy eternal systems via evolving software, data and knowledge. Berlin: Springer; 2013. p. 77–90. https://doi.org/10.1007/978-3-642-45260-4_6
Chapter Google Scholar
Cruz F, Dazeley R, Vamplew P. Memory-Based Explainable Reinforcement Learning. In: Liu J, Bailey J, editors. AI 2019: Advances in Artificial Intelligence, vol. 11919. Berlin: Springer; 2019. p. 66–77.https://doi.org/10.1007/978-3-030-35288-2_6
Chapter Google Scholar
Cruz F, Dazeley R, Vamplew P. (2020). Explainable robotic systems: Understanding goal-driven actions in a reinforcement learning scenario. ArXiv:2006.13615[Cs].nn
Culbert, C, Riley G, Savely R. T. (1987). Approaches to the Verification of Rule-Based Expert Systems. SOAR’87L First Annual Workshop on Space Operation Automation and Robotics, 27–37.
Dağlarli E. (2020). Explainable Artificial Intelligence (xAI) Approaches and Deep Meta-Learning Models. In Advances and Applications in Deep Learning. IntechOpen.
D’Alterio, P Garibaldi, J. M John, R. I. (2020). Constrained Interval Type-2 Fuzzy Classification Systems for Explainable AI (XAI). 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 1–8. https://doi.org/10.1109/FUZZ48607.2020.9177671
Das A, Rad P. (2020). Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey. ArXiv:2006.11371[Cs].n
David, N. (2013). Validating Simulations. In Simulating Social Complexity (pp. 135–171). Springer Berlin Heidelberg.
Davis P. K. (1992). Generalizing concepts and methods of verification, validation, and accreditation (VV&A) for military simulations. Rand.
de Laat PB. Algorithmic decision-making based on machine learning from big data: can transparency restore accountability? Philos Technol. 2018;31(4):525–41. https://doi.org/10.1007/s13347-017-0293-z.
Article Google Scholar
De Raedt L, Sablon G, Bruynooghe M. Using Interactive Concept Learning for Knowledge-base Validation and Verification. In: Validation, verification and test of knowledge-based systems. Hoboken: Wiley; 1991. p. 177–90.
Google Scholar
Dghaym D, Turnock S, Butler M, Downes J, Hoang T. S, Pritchard B. (2020). Developing a Framework for Trustworthy Autonomous Maritime Systems. In Proceedings of the International Seminar on Safety and Security of Autonomous Vessels (ISSAV) and European STAMP Workshop and Conference (ESWC) 2019 (pp. 73–82). Sciendo. https://doi.org/10.2478/9788395669606-007
Diallo, A. B, Nakagawa H, Tsuchiya T. (2020). An Explainable Deep Learning Approach for Adaptation Space Reduction. 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), 230–231. https://doi.org/10.1109/ACSOS-C51401.2020.00063
Dibie-Barthelemy J, Haemmerle O, Salvat E. (2006). A semantic validation of conceptual graphs. 13.
Dobson J. Can an algorithm be disturbed?: Machine learning, intrinsic criticism, and the digital humanities. Coll Lit. 2015;42:543–64. https://doi.org/10.1353/lit.2015.0037.
Article Google Scholar
US Department of Defense (DoD) Directive 5000.59. 1995.
Dodge J, Burnett M. (2020). Position: We Can Measure XAI Explanations Better with Templates. ExSS-ATEC@IUI, 1–13.
Dong G, Wu S, Wang G, Guo T, Huang Y. Security assurance with metamorphic testing and genetic algorithm. IEEE/WIC/ACM Int Conf Web Intell Agent Technol. 2010;2010:397–401. https://doi.org/10.1109/WI-IAT.2010.101.
Article Google Scholar
Došilović, F. K, Brcic M, Hlupic N. (2018). Explainable artificial intelligence: A survey. 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 0210–0215. https://doi.org/10.23919/MIPRO.2018.8400040
Dupuis NK, Verheij DB. An analysis of decompositional rule extraction for explainable neural Networks. Groningen: University of Groningen; 2019.
Google Scholar
Edwards D. Data Quality Assurance. In: Ecological data: design, management and processing. Hoboken: Blackwell; 2000. p. 70–91.
Google Scholar
El Naqa I, Irrer J, Ritter TA, DeMarco J, Al-Hallaq H, Booth J, Kim G, Alkhatib A, Popple R, Perez M, Farrey K, Moran JM. Machine learning for automated quality assurance in radiotherapy: a proof of principle using EPID data description. Med Phys. 2019;46(4):1914–21. https://doi.org/10.1002/mp.13433.
Article Google Scholar
Elsayed G, Shankar S, Cheung B, Papernot N, Kurakin A, Goodfellow I, Sohl-Dickstein J. (2018). Adversarial Examples that Fool both Computer Vision and Time-Limited Humans. 11.
Everitt T, Lea G, Hutter M. (2018). AGI Safety Literature Review. ArXiv:1805.01109[Cs].
Ferreyra E, Hagras H, Kern M, Owusu G. (2019). Depicting Decision-Making: A Type-2 Fuzzy Logic Based Explainable Artificial Intelligence System for Goal-Driven Simulation in the Workforce Allocation Domain. 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 1–6. https://doi.org/10.1109/FUZZ-IEEE.2019.8858933
Forster D. A. (2006). Validation of individual consciousness in Strong Artificial Intelligence: An African Theological contribution. University of South Africa.
Gao J, Xie C, Tao C. Big data validation and quality assurance—issuses, challenges, and Needs. IEEE Symposium on Service-Oriented System Engineering (SOSE). 2016;2016:433–41. https://doi.org/10.1109/SOSE.2016.63.
Article Google Scholar
Gardiner L-J, Carrieri AP, Wilshaw J, Checkley S, Pyzer-Knapp EO, Krishna R. Using human in vitro transcriptome analysis to build trustworthy machine learning models for prediction of animal drug toxicity. Sci Rep. 2020;10(1):9522. https://doi.org/10.1038/s41598-020-66481-0.
Article Google Scholar
Gilstrap L. Validation and verification of expert systems. Telematics Inform. 1991;8(4):439–48. https://doi.org/10.1016/S0736-5853(05)80064-4.
Article Google Scholar
Ginsberg A, Weiss S. (2001). SEEK2: A Generalized Approach to Automatic Knowledge Base Refinement. 9th International Joint Conference on Artificial Intelligence, 1, 8.
Glomsrud J. A, Ødegårdstuen A, Clair A. L. S, Smogeli Ø. (2020). Trustworthy versus Explainable AI in Autonomous Vessels. In Proceedings of the International Seminar on Safety and Security of Autonomous Vessels (ISSAV) and European STAMP Workshop and Conference (ESWC) 2019 (pp. 37–47). Sciendo. https://doi.org/10.2478/9788395669606-004
Go, W Lee D. (2018). Toward Trustworthy Deep Learning in Security. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2219–2221. https://doi.org/10.1145/3243734.3278526
Gonzalez AJ, Barr V. Validation and verification of intelligent systems—What are they and how are they different? J Exp Theor Artif Intell. 2000;12(4):407–20. https://doi.org/10.1080/095281300454793.
Article MATH Google Scholar
Gonzalez AJ, Gupta UG, Chianese RB. Performance evaluation of a large diagnostic expert system using a heuristic test case generator. Eng Appl Artif Intell. 1996;9(3):275–84. https://doi.org/10.1016/0952-1976(95)00018-6.
Article Google Scholar
Goodfellow I. J, Shlens J, Szegedy C. (2015). Explaining and Harnessing Adversarial Examples. ArXiv:1412.6572 [Cs, Stat].nn
Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D. A survey of methods for explaining black box models. ACM Comput Surv. 2019;51(5):1–42. https://doi.org/10.1145/3236009.
Article Google Scholar
Gulshan V, Peng L, Coram, M, Stumpe M. C, Wu D, Narayanaswamy A, Venugopalan S, Widner K, Madams T, Cuadros J, Kim R, Raman R, Nelson P. C, Mega J. L, Webster D. R. (2016). Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. 9.
Guo W. Explainable artificial intelligence for 6G: improving trust between human and machine. IEEE Commun Mag. 2020;58(6):39–45. https://doi.org/10.1109/MCOM.001.2000050.
Article Google Scholar
Hagras H. Toward human-understandable, explainable AI. Computer. 2018;51(9):28–36. https://doi.org/10.1109/MC.2018.3620965.
Article Google Scholar
Hailu G, Sommer G. (1999). On amount and quality of bias in reinforcement learning. IEEE SMC’99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028), 2, 728–733. https://doi.org/10.1109/ICSMC.1999.825352
Halkidi M, Batistakis Y, Vazirgiannis M. On clustering validation techniques. J Intell Inf Syst. 2001;17(2/3):107–45.
Article Google Scholar
Halliwell N, Lecue F. (2020). Trustworthy Convolutional Neural Networks: A Gradient Penalized-based Approach. ArXiv:2009.14260[Cs].
Han S-H, Kwon M-S, Choi H-J. EXplainable AI (XAI) approach to image captioning. J Eng. 2020;2020(13):589–94. https://doi.org/10.1049/joe.2019.1217.
Article Google Scholar
Harmelen F, Teije A. (1997). Validation and Verification of Conceptual Models of Diagnosis. Fourth European Symposium on the Validation and Verification of Knowledge-Based Systems, 117–128.
Haverinen T. (2020). Towards Explainable Artificial Intelligence (XAI) [Master’s Thesis]. University of Jyväskylä.
He C, Xing J, Li J, Yang Q, Wang R, Zhang X. A new optimal sensor placement strategy based on modified modal assurance criterion and improved adaptive genetic algorithm for structural health monitoring. Math Probl Eng. 2015;2015:1–10. https://doi.org/10.1155/2015/626342.
Article Google Scholar
He H, Gray J, Cangelosi A, Meng Q, McGinnity T. M, Mehnen J. (2020). The Challenges and Opportunities of Artificial Intelligence for Trustworthy Robots and Autonomous Systems. 2020 3rd International Conference on Intelligent Robotic and Control Engineering (IRCE), 68–74. https://doi.org/10.1109/IRCE50905.2020.9199244
He Y, Meng G, Chen K, Hu X, He J. (2020). Towards Security Threats of Deep Learning Systems: A Survey. ArXiv:1911.12562[Cs].
Heaney KD, Lermusiaux PFJ, Duda TF, Haley PJ. Validation of genetic algorithm-based optimal sampling for ocean data assimilation. Ocean Dyn. 2016;66(10):1209–29. https://doi.org/10.1007/s10236-016-0976-5.
Article Google Scholar
Heuer H, Breiter A. (2020). More Than Accuracy: Towards Trustworthy Machine Learning Interfaces for Object Recognition. Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization, 298–302. https://doi.org/10.1145/3340631.3394873
Heuillet A, Couthouis F, Díaz-Rodríguez N. (2020). Explainability in Deep Reinforcement Learning. ArXiv:2008.06693 [Cs].
Hibbard B. Bias and no free lunch in formal measures of intelligence. J Artif General Intell. 2009;1(1):54–61. https://doi.org/10.2478/v10229-011-0004-6.
Article Google Scholar
Huber T. (2019). Enhancing Explainability of Deep Reinforcement Learning Through Selective Layer-Wise Relevance Propagation. 15.
Islam MA, Anderson DT, Pinar A, Havens TC, Scott G, Keller JM. Enabling explainable fusion in deep learning with fuzzy integral neural Networks. IEEE Trans Fuzzy Syst. 2019. https://doi.org/10.1109/TFUZZ.2019.2917124.
Article Google Scholar
Israelsen B. W, Ahmed N. R. (2019). “Dave...I can assure you ...that it’s going to be all right ...” A Definition, Case for, and Survey of Algorithmic Assurances in Human-Autonomy Trust Relationships. ACM Computing Surveys, 51(6), 1–37. https://doi.org/10.1145/3267338
Janssen M, Kuk G. The challenges and limits of big data algorithms in technocratic governance. Gov Inf Q. 2016;33(3):371–7. https://doi.org/10.1016/j.giq.2016.08.011.
Article Google Scholar
Jha S, Raj S, Fernandes S, Jha S. K, Jha S, Jalaian B, Verma G, Swami A. (2019). Attribution-Based Confidence Metric For Deep Neural Networks. https://openreview.net/forum?id=rkeYFrHgIB
Jiang N, Li L. (2016). Doubly Robust Off-policy Value Evaluation for Reinforcement Learning. 33 Rd International Conference on Machine Learning, 48, 10.
Jilk, D. J. (2018). Limits to Verification and Validation of Agentic Behavior. In Artificial Intelligence Safety and Security (pp. 225–234). Taylor Francis Group. https://doi.org/10.1201/9781351251389-16
Jones G, Willett P, Glen RC, Leach AR, Taylor R. Development and validation of a genetic algorithm for flexible docking11 Edited by F E Cohen. J Mol Biol. 1997;267(3):727–48. https://doi.org/10.1006/jmbi.1996.0897.
Article Google Scholar
Jorge E, Brynte L, Cronrath C, Wigstrom O, Bengtsson K, Gustavsson E, Lennartson B, Jirstrand M. Reinforcement learning in real-time geometry assurance. In: 51st CIRP Proceedings of the Conference on Manufacturing Systems. 2018. p. 1073–8.
Joo H-T, Kim K-J. Visualization of deep reinforcement learning using Grad-CAM: how AI plays atari games? IEEE Conf Games (CoG). 2019;2019:1–2. https://doi.org/10.1109/CIG.2019.8847950.
Article Google Scholar
Katell M, Young M, Dailey D, Herman B, Guetler V, Tam A, Binz C, Raz D, Krafft P. M. (2020). Toward situated interventions for algorithmic equity: Lessons from the field. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 45–55. https://doi.org/10.1145/3351095.3372874
Kaul S. (2018). Speed And Accuracy Are Not Enough! Trustworthy Machine Learning. Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 372–373. https://doi.org/10.1145/3278721.3278796
Kaur D, Uslu S, Durresi A. Trust-based security mechanism for detecting clusters of fake users in social networks. In: Barolli L, Takizawa M, Xhafa F, Enokido T, editors. Web, artificial intelligence and network applications, vol. 927. Berlin: Springer; 2019. p. 641–50. https://doi.org/10.1007/978-3-030-15035-8_62.
Chapter Google Scholar
Kaur D, Uslu S, Durresi A. Requirements for Trustworthy Artificial Intelligence – A Review. In: Barolli L, Li KF, Enokido T, Takizawa M, editors. Advances in Networked-Based Information Systems, vol. 1264. Berlin: Springer; 2021. p. 105–15; https://doi.org/10.1007/978-3-030-57811-4_11.
Chapter Google Scholar
Kaur D, Uslu S, Durresi A, Mohler G, Carter JG. Trust-based human-machine collaboration mechanism for predicting crimes. In: Barolli L, Amato F, Moscato F, Enokido T, Takizawa M, editors. Advanced information networking and applications, vol. 1151. Berlin: Springer; 2020. p. 603–16; https://doi.org/10.1007/978-3-030-44041-1_54.
Chapter Google Scholar
Keneni BM, Kaur D, Al Bataineh A, Devabhaktuni VK, Javaid AY, Zaientz JD, Marinier RP. Evolving rule-based explainable artificial intelligence for unmanned aerial vehicles. IEEE Access. 2019;7:17001–16. https://doi.org/10.1109/ACCESS.2019.2893141.
Article Google Scholar
Kianifar MR. Application of permutation genetic algorithm for sequential model building–model validation design of experiments. Soft Comput. 2016;20:3023–44. https://doi.org/10.1007/s00500-015-1929-5.
Article Google Scholar
Knauf R, Gonzalez AJ, Abel T. A framework for validation of rule-based systems. Cybern PART B. 2002;32(3):15.
Article Google Scholar
Knauf R, Tsuruta S, Gonzalez AJ. Toward reducing human involvement in validation of knowledge-based systems. IEEE Trans Syst Man Cybern Part A. 2007;37(1):120–31. https://doi.org/10.1109/TSMCA.2006.886365.
Article Google Scholar
Kohlbrenner M, Bauer A, Nakajima S, Binder A, Samek W, Lapuschkin S. Towards best practice in explaining neural network decisions with LRP. Int Joint Conf Neural Netw. 2020;2020:1–7. https://doi.org/10.1109/IJCNN48605.2020.9206975.
Article Google Scholar
Kulkarni A, Chong D, Batarseh FA. Foundations of data imbalance and solutions for a data democracy. In: data democracy. Cambridge: Academic Press; 2020. p. 83–106.
Chapter Google Scholar
Kuppa A, Le-Khac N-A. Black box attacks on explainable artificial intelligence (XAI) methods in cyber security. Int Joint Confer Neural Netw. 2020;2020:1–8. https://doi.org/10.1109/IJCNN48605.2020.9206780.
Kurd Z, Kelly T. Safety lifecycle for developing safety critical artificial neural networks. In: Anderson S, Felici M, Littlewood B, editors. Computer safety, reliability, and security. Berlin: Springer; 2003. p. 77–91.
Chapter Google Scholar
Kuzlu M, Cali U, Sharma V, Guler O. Gaining insight into solar photovoltaic power generation forecasting utilizing explainable artificial intelligence tools. IEEE Access. 2020;8:187814–23. https://doi.org/10.1109/ACCESS.2020.3031477.
Article Google Scholar
Lee J, ha, Shin, I. hee, Jeong, S. gu, Lee, S.-I, Zaheer, M. Z, Seo, B.-S. . Improvement in deep networks for optimization using eXplainable artificial intelligence. 2019 2019 International Conference on Information and Communication Technology Convergence (ICTC), 525–30. https://doi.org/10.1109/ICTC46691.2019.8939943.
Lee S, O’Keefe RM. Developing a strategy for expert system verification and validation. IEEE Trans Syst Man Cybern. 1994;24(4):643–55. https://doi.org/10.1109/21.286384.
Article Google Scholar
Leibovici D. G, Rosser J. F, Hodges C, Evans B, Jackson M. J, Higgins C. I. (2017). On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for Environmental Studies. 17.
Lepri B, Oliver N, Letouzé E, Pentland A, Vinck P. Fair, transparent, and accountable algorithmic decision-making processes: the premise, the proposed solutions, and the open challenges. Philos Technol. 2018;31(4):611–27. https://doi.org/10.1007/s13347-017-0279-x.
Article Google Scholar
Li X-H, Cao CC, Shi Y, Bai W, Gao H, Qiu L, Wang C, Gao Y, Zhang S, Xue X, Chen L. A survey of data-driven and knowledge-aware eXplainable AI. IEEE Trans Knowl Data Eng. 2020. https://doi.org/10.1109/TKDE.2020.2983930.
Article Google Scholar
Liang X, Zhao J, Shetty S, Li D. (2017). Towards data assurance and resilience in IoT using blockchain. MILCOM 2017 - 2017 IEEE Military Communications Conference (MILCOM), 261–266. https://doi.org/10.1109/MILCOM.2017.8170858
Liu F, Yang M. (2004). Verification and validation of al simulation systems. Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826), 3100–3105. https://doi.org/10.1109/ICMLC.2004.1378566
Liu F, Yang M. (2005). Verification and Validation of Artificial Neural Network Models AI 2005: Advances in Artificial Intelligence, 3809:1041–1046.
Liu F, Yang M, Shi P. (2008). Verification and validation of fuzzy rules-based human behavior models. 2008 Asia Simulation Conference - 7th International Conference on System Simulation and Scientific Computing, 813–819. https://doi.org/10.1109/ASC-ICSC.2008.4675474
Lockwood S, Chen Z. Knowledge validation of engineering expert systems. Adv Eng Softw. 1995;23(2):97–104. https://doi.org/10.1016/0965-9978(95)00018-R.
Article Google Scholar
Lowry M, Havelund K, Penix J. Verification and validation of AI systems that control deep-space spacecraft. In: Raś ZW, Skowron A, editors. Foundations of Intelligent Systems, vol. 1325. Berlin: Springer; 1997. p. 35–47; https://doi.org/10.1007/3-540-63614-5_3.
Chapter Google Scholar
Mackowiak R, Ardizzone L, Köthe U, Rother, C. (2020). Generative Classifiers as a Basis for Trustworthy Computer Vision. ArXiv:2007.15036 [Cs].nn
Madumal P, Miller T, Sonenberg L, Vetere F. (2019). Explainable Reinforcement Learning Through a Causal Lens. ArXiv:1905.10958 [Cs, Stat].
Maloca PM, Lee AY, de Carvalho ER, Okada M, Fasler K, Leung I, Hörmann B, Kaiser P, Suter S, Hasler PW, Zarranz-Ventura J, Egan C, Heeren TFC, Balaskas K, Tufail A, Scholl HPN. Validation of automated artificial intelligence segmentation of optical coherence tomography images. PLoS ONE. 2019;14(8):e0220063. https://doi.org/10.1371/journal.pone.0220063.
Article Google Scholar
Magazzeni D, McBurney P, Nash W. Validation and Verification of Smart Contracts: A Research Agenda. Computer. 2017;50(9):50–57. https://doi.org/10.1109/MC.2017.3571045
Article Google Scholar
Malolan B, Parekh A, Kazi F. (2020). Explainable Deep-Fake Detection Using Visual Interpretability Methods. 2020 3rd International Conference on Information and Computer Technologies (ICICT), 289–293. https://doi.org/10.1109/ICICT50521.2020.00051
Marcos M, del Pobil AP, Moisan S. Model-based verification of knowledge-based systems: a case study. IEE Proceedings - Software. 2000;147(5):163. https://doi.org/10.1049/ip-sen:20000896.
Article Google Scholar
Martin M. O, Mullis I. V. S, Bruneforth M, Third International Mathematics and Science Study (Eds.). (1996). Quality assurance in data collection. Center for the Study of Testing, Evaluation, and Educational Policy, Boston College.
Martinez-Balleste, A, Rashwan, H. A, Puig, D, Fullana, A. P. (2012). Towards a trustworthy privacy in pervasive video surveillance systems. 2012 IEEE International Conference on Pervasive Computing and Communications Workshops, 914–919. https://doi.org/10.1109/PerComW.2012.6197644
Martínez-Fernández S, Franch X, Jedlitschka A, Oriol M, Trendowicz A. (2020). Research Directions for Developing and Operating Artificial Intelligence Models in Trustworthy Autonomous Systems. ArXiv:2003.05434[Cs].n
Martín-Guerrero JD, Soria-Olivas E, Martínez-Sober M, Climente-Martí M, De Diego-Santos T, Jiménez-Torres NV. Validation of a reinforcement learning policy for dosage optimization of erythropoietin. In: Orgun MA, Thornton J, editors. AI 2007: Advances in artificial intelligence. Berlin: Springer; 2007. p. 732–8.
Chapter Google Scholar
Mason G, Calinescu R, Kudenko D, Banks A. (2017a). Assured Reinforcement Learning for Safety-Critical Applications.
Mason G, Calinescu R, Kudenko D, Banks A. Assurance in reinforcement learning using quantitative verification. In: Hatzilygeroudis I, Palade V, editors. Advances in hybridization of intelligent methods, vol. 85. Berlin: Springer; 2018. p. 71–96.
Chapter Google Scholar
Mason G, Calinescu R, Kudenko D, Banks A. (2017b). Assured Reinforcement Learning with Formally Verified Abstract Policies. Proceedings of the 9th International Conference on Agents and Artificial Intelligence, 105–117. https://doi.org/10.5220/0006156001050117
Massoli FV, Carrara F, Amato G, Falchi F. Detection of face recognition adversarial attacks. Comput Vision Image Understand. 2021;11:103103.
Article Google Scholar
Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A. (2019). A Survey on Bias and Fairness in Machine Learning. ArXiv:1908.09635 [Cs].n
Mehri V. A, Ilie D, Tutschku K. (2018). Privacy and DRM Requirements for Collaborative Development of AI Applications. Proceedings of the 13th International Conference on Availability, Reliability and Security - ARES 2018, 1–8. https://doi.org/10.1145/3230833.3233268
Mengshoel OJ. Knowledge validation: principles and practice. IEEE Expert. 1993;8(3):62–8. https://doi.org/10.1109/64.215224.
Article Google Scholar
Menzies T, Pecheur C. Verification and validation and artificial intelligence. In: Advances in computers, vol. 65. Amsterdam: Elsevier; 2005. p. 153–201.
Google Scholar
Meskauskas Z, Jasinevicius R, Kazanavicius E, Petrauskas, V. (2020). XAI-Based Fuzzy SWOT Maps for Analysis of Complex Systems. 2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 1–8. https://doi.org/10.1109/FUZZ48607.2020.9177792
Miller J. Active Nonlinear Test (ANTs) of Complex Simulation Models. Manag Sci. 1998;44(6):482.
Article Google Scholar
Min F, Ma P, Yang M. A knowledge-based method for the validation of military simulation. Winter Simulation Conf. 2007;2007:1395–402. https://doi.org/10.1109/WSC.2007.4419748.
Article Google Scholar
Min Fei-yan, Yang, M, Wang, Z. (2006). An Intelligent Validation System of Simulation Model. 2006 International Conference on Machine Learning and Cybernetics, 1459–1464. https://doi.org/10.1109/ICMLC.2006.258759
Morell L. J. (1988). Use of metaknowledge in the verification of knowledge-based systems. Proceedings of the 1st International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems - Volume 2, 847–857. https://doi.org/10.1145/55674.55699
Mosqueira-Rey E, Moret-Bonillo V. Validation of intelligent systems: a critical study and a tool. Expert Syst Appl. 2000;16:1–6.
Article Google Scholar
Mueller ST, Hoffman, RR, Clancey W, Emrey A, Klein G. (2019). Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of Key Ideas and Publications, and Bibliography for Explainable AI. ArXiv:1902.01876 [Cs].n
Murray B, Islam M. A, Pinar A. J, Havens, T. C, Anderson D. T, Scott G. (2018). Explainable AI for Understanding Decisions and Data-Driven Optimization of the Choquet Integral. 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 1–8. https://doi.org/10.1109/FUZZ-IEEE.2018.8491501
Murray BJ, Islam MA, Pinar AJ, Anderson DT, Scott GJ, Havens TC, Keller JM. Explainable AI for the Choquet Integral. IEEE Trans Emerg Topics in Comput Intell. 2020. https://doi.org/10.1109/TETCI.2020.3005682.
Article Google Scholar
Murrell S, Plant TR. A survey of tools for the validation and verification of knowledge-based systems: 1985–1995. Decis Support Syst. 1997;21(4):307–23. https://doi.org/10.1016/S0167-9236(97)00047-X.
Article Google Scholar
Mynuddin M, Gao W. Distributed predictive cruise control based on reinforcement learning and validation on microscopic traffic simulation. IET Intel Transport Syst. 2020;14(5):270–7. https://doi.org/10.1049/iet-its.2019.0404.
Article Google Scholar
Nassar M, Salah K, Rehman MH, Svetinovic D. Blockchain for explainable and trustworthy artificial intelligence. WIREs Data Mining Knowl Disc. 2020;10(1):e1340. https://doi.org/10.1002/widm.1340.
Article Google Scholar
Niazi M. A, Siddique Q, Hussain A, Kolberg M. (2010). Verification & validation of an agent-based forest fire simulation model. Proceedings of the 2010 Spring Simulation Multiconference, 1–8. https://doi.org/10.1145/1878537.1878539
Nourani CF. Multi-agent object level AI validation and verification. ACM SIGSOFT Softw Eng Notes. 1996;21(1):70–2. https://doi.org/10.1145/381790.381802.
Article Google Scholar
O’Keefe RM, Balci O, Smith EP. Validating expert system performance. IEEE Expert. 1987;2(4):81–90. https://doi.org/10.1109/MEX.1987.5006538.
Article Google Scholar
On Artificial Intelligence—A European approach to excellence and trust. (2020). European Commision.
Onoyama T, Tsuruta S. Validation method for intelligent systems. J Exp Theor Artif Intell. 2000;12(4):461–72. https://doi.org/10.1080/095281300454838.
Article MATH Google Scholar
Pawar U, O’Shea D, Rea S, O’Reilly R. (2020). Explainable AI in Healthcare. 2020 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA), 1–2. https://doi.org/10.1109/CyberSA49311.2020.9139655
Payrovnaziri SN, Chen Z, Rengifo-Moreno P, Miller T, Bian J, Chen JH, Liu X, He Z. Explainable artificial intelligence models using real-world electronic health record data: a systematic scoping review. J Am Med Inform Assoc. 2020;27(7):1173–85. https://doi.org/10.1093/jamia/ocaa053.
Article Google Scholar
Pèpe G, Perbost R, Courcambeck J, Jouanna P. Prediction of molecular crystal structures using a genetic algorithm: validation by GenMolTM on energetic compounds. J Cryst Growth. 2009;311(13):3498–510. https://doi.org/10.1016/j.jcrysgro.2009.04.002.
Article Google Scholar
Peppler RA, Long CN, Sisterson DL, Turner DD, Bahrmann CP, Christensen SW, Doty KJ, Eagan RC, Halter TD, Iveyh MD, Keck NN, Kehoe KE, Liljegren JC, Macduff MC, Mather JH, McCord RA, Monroe JW, Moore ST, Nitschke KL, Wagener R. An overview of ARM program climate research facility data quality assurance. Open Atmos Sci J. 2008;2(1):192–216. https://doi.org/10.2174/1874282300802010192.
Article Google Scholar
Pitchforth, J. (2013). A proposed validation framework for expert elicited Bayesian Networks. Expert Systems with Applications, 6.
Pocius R, Neal L, Fern A. Strategic tasks for explainable reinforcement learning. Proc AAAI Conf Artif Intell. 2019;33:10007–8. https://doi.org/10.1609/aaai.v33i01.330110007.
Article Google Scholar
Preece AD, Shinghal R, Batarekh A. Verifying expert systems: a logical framework and a practical tool. Expert Syst Appl. 1992;5(3–4):421–36. https://doi.org/10.1016/0957-4174(92)90026-O.
Article Google Scholar
Prentzas, N, Nicolaides, A, Kyriacou, E, Kakas, A, Pattichis, C. (2019). Integrating Machine Learning with Symbolic Reasoning to Build an Explainable AI Model for Stroke Prediction. 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE), 817–821. https://doi.org/10.1109/BIBE.2019.00152
Puiutta E, Veith E. M. (2020). Explainable Reinforcement Learning: A Survey. ArXiv:2005.06247 [Cs, Stat].n
Putzer, H. J, Wozniak E. (2020). A Structured Approach to Trustworthy Autonomous/Cognitive Systems. ArXiv:2002.08210 [Cs].n
Pynadath DV. Transparency communication for machine learning. In human-automation interaction human and machine learning. Berlin: Springer International Publishing; 2018.
Google Scholar
Qiu S, Liu Q, Zhou S, Wu C. Review of artificial intelligence adversarial attack and defense technologies. Appl Sci. 2019;9(5):909. https://doi.org/10.3390/app9050909.
Article Google Scholar
Ragot M, Martin N, Cojean S. (2020). AI-generated vs. Human Artworks. A Perception Bias Towards Artificial Intelligence? Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, 1–10. https://doi.org/10.1145/3334480.3382892
Raji ID, Smart A, White RN, Mitchell M, Gebru T, Hutchinson, B, Smith-Loud, J, Theron D, Barnes P. (2020). Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing. ArXiv:2001.00973 [Cs].n
Raymond P, Yoav S, Erik B, Jack C, John E, Barbara G, Terah L, James M, Juan C N, Saurabh M. (2020). Artificial Intelligence Index 2019 Annual report [Artificial Intelligence Index Annual Report]. Stanford University Human AI. Available at: https://hai.stanford.edu/sites/default/files/ai_index_2019_report.pdf
Ren H, Chandrasekar S. K, Murugesan A. (2019). Using Quantifier Elimination to Enhance the Safety Assurance of Deep Neural Networks. ArXiv:1909.09142 [Cs, Stat].n
Rossi F. (2018). Building Trust in Artificial Intelligence. Undefined. /paper/Building-Trust-in-Artificial-Intelligence-Rossi/e7a84026ac8806bd377b5b491c57096083bbbb18
Rotman, N. H, Schapira M, Tamar, A. (2020). Online Safety Assurance for Deep Reinforcement Learning. ArXiv:2010.03625 [Cs].
Rovcanin M, De Poorter E, van den Akker D, Moerman I, Demeester P, Blondia C. Experimental validation of a reinforcement learning based approach for a service-wise optimisation of heterogeneous wireless sensor networks. Wireless Netw. 2015;21(3):931–48. https://doi.org/10.1007/s11276-014-0817-8.
Article Google Scholar
Ruan Y, Zhang P, Alfantoukh L, Durresi A. Measurement Theory-Based Trust Management Framework for Online Social Communities. ACM Transactions on Internet Technology. 2017;17(2):1–24. https://doi.org/10.1145/3015771.
Article Google Scholar
Ruan W, Huang X, Kwiatkowska M (2018). Reachability Analysis of Deep Neural Networks with Provable Guarantees. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2651–2659. https://doi.org/10.24963/ijcai.2018/368
Sarathy N, Alsawwaf M, Chaczko Z. (2020). Investigation of an Innovative Approach for Identifying Human Face-Profile Using Explainable Artificial Intelligence. 2020 IEEE 18th International Symposium on Intelligent Systems and Informatics (SISY), 155–160. https://doi.org/10.1109/SISY50555.2020.9217095
Sargent RG. Verification and validation of simulation models. J Simul. 2013;7(1):12–24. https://doi.org/10.1057/jos.2012.20.
Article MathSciNet Google Scholar
Sargent RG (1984). A tutorial on verification and validation of simulation models. Proceedings of the 16th Conference on Winter Simulation, 114–121.
Sargent RG (2004). Validation and Verification of Simulation Models. Proceedings of the 2004 Winter Simulation Conference, 2004, 1, 13–24. https://doi.org/https://doi.org/10.1109/WSC.2004.1371298
Sargent RG. (2010). Verification and validation of simulation models. Proceedings of the 2010 Winter Simulation Conference, 166–183. https://doi.org/10.1109/WSC.2010.5679166
Schlegel U, Arnout H, El-Assady M, Oelke D, Keim D. A. (2019). Towards a Rigorous Evaluation of XAI Methods on Time Series. ArXiv:1909.07082 [Cs].
Schumann J, Gupta P, Liu Y. Application of neural networks in high assurance systems: a survey. In: Schumann J, Liu Y, editors. Applications of neural networks in high assurance systems, vol. 268. Berlin: Springer; 2010. p. 1–19; https://doi.org/10.1007/978-3-642-10690-3_1.
Chapter Google Scholar
Schumann J, Gupta, P, Nelson S. (2003). On verification validation of neural network based controllers.
Sequeira P, Gervasio M. Interestingness elements for explainable reinforcement learning: understanding agents’ capabilities and limitations. Artif Intell. 2020;288:103367. https://doi.org/10.1016/j.artint.2020.103367.
Article MathSciNet Google Scholar
Sileno G, Boer A, van Engers T. (2018). The Role of Normware in Trustworthy and Explainable AI.ArXiv:1812.02471 [Cs].
Singer E, Thurn DRV, Miller ER. Confidentiality assurances and response: a quantitative review of the experimental literature. Public Opin Q. 1995;59(1):66. https://doi.org/10.1086/269458.
Article Google Scholar
Sivamani KS, Sahay R, Gamal AE. Non-intrusive detection of adversarial deep learning attacks via observer networks. IEEE Lett Comput Soc. 2020;3(1):25–8. https://doi.org/10.1109/LOCS.2020.2990897.
Article Google Scholar
Spada M. R, Vincentini A. (2019). Trustworthy AI for 5G: Telco Experience and Impact in the 5G ESSENCE. In J. MacIntyre, I. Maglogiannis, L. Iliadis, E. Pimenidis (Eds.), Artificial Intelligence Applications and Innovations (pp. 103–110). Springer International Publishing. https://doi.org/10.1007/978-3-030-19909-8_9
Spinner T, Schlegel U, Schafer H, El-Assady M. explAIner: a visual analytics framework for interactive and explainable machine learning. IEEE Trans Visual Comput Graph. 2019. https://doi.org/10.1109/TVCG.2019.2934629.
Article Google Scholar
Srivastava B, Rossi, F. (2019). Towards Composable Bias Rating of AI Services.ArXiv:1808.00089 [Cs].
Stock P, Cisse M. ConvNets and Imagenet beyond accuracy: understanding mistakes and uncovering biases. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, editors. Computer Vision – ECCV 2018, vol. 11210. Berlin: Springer; 2018. p. 504–19.
Chapter Google Scholar
Suen CY, Grogono PD, Shinghal R, Coallier F. Verifying, validating, and measuring the performance of expert systems. Expert Syst Appl. 1990;1(2):93–102. https://doi.org/10.1016/0957-4174(90)90019-Q.
Article Google Scholar
Sun SC, Guo, W. (2020). Approximate Symbolic Explanation for Neural Network Enabled Water-Filling Power Allocation. 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), 1–4. https://doi.org/10.1109/VTC2020-Spring48590.2020.9129447
Tadj C. Dynamic verification of an object-rule knowledge base using colored petri Nets. System Cybern Inf. 2005;4(3):9.
Google Scholar
Tan R, Khan N, Guan L. Locality guided neural networks for explainable artificial intelligence. International Joint Conference on Neural Networks. 2020;2020:1–8. https://doi.org/10.1109/IJCNN48605.2020.9207559.
Tao C, Gao J, Wang T. Testing and quality validation for AI software-perspectives issues, and practices. IEEE Access. 2019;7:12.
Article Google Scholar
Tao, J, Xiong, Y, Zhao, S, Xu, Y, Lin, J, Wu, R, Fan, C. (2020). XAI-Driven Explainable Multi-view Game Cheating Detection. 2020 IEEE Conference on Games (CoG), 144–151. https://doi.org/10.1109/CoG47356.2020.9231843
Taylor BJ, Darrah MA (2005). Rule extraction as a formal method for the verification and validation of neural networks. Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005, 5, 2915–2920. https://doi.org/10.1109/IJCNN.2005.1556388
Taylor, Brian J. (Ed.). (2006). Methods and Procedures for the Verification and Validation of Artificial Neural Networks. Springer US. https://doi.org/10.1007/0-387-29485-6
Taylor BJ, Darrah MA, Moats CD (2003). Verification and validation of neural networks: A sampling of research in progress (K. L. Priddy P. J. Angeline, Eds.; p. 8). https://doi.org/10.1117/12.487527
Taylor, E, Shekhar, S, Taylor, G. W. (2020). Response Time Analysis for Explainability of Visual Processing in CNNs. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1555–1558. https://doi.org/10.1109/CVPRW50498.2020.00199
Thomas JD, Sycara K. (1999). The Importance of Simplicity and Validation in Genetic Programming for Data Mining in Financial Data. AAAI Technical Report, 5.
Tjoa E, Guan C. A survey on explainable artificial intelligence (XAI): towards medical XAI. IEEE Trans Neural Netw Learn Syst. 2020. https://doi.org/10.1109/TNNLS.2020.3027314.
Article Google Scholar
Toreini E, Aitken M, Coopamootoo K, Elliott K, Zelaya CG (2020). The relationship between trust in AI and trustworthy machine learning technologies. 12.
Toreini E, Aitken M, Coopamootoo KPL, Elliott K, Zelaya VG, Missier P, Ng M, van Moorsel A (2020). Technologies for Trustworthy Machine Learning: A Survey in a Socio-Technical Context.ArXiv:2007.08911 [Cs, Stat].
Tsai W-T, Vishnuvajjala R, Zhang D. Verification and validation of knowledge-based systems. IEEE Trans Knowl Data Eng. 1999;11(1):11.
Article Google Scholar
Turing A. Computing machinery and intelligence. Mind. 1950;59(236):433–60.
Article MathSciNet Google Scholar
Uslu S, Kaur D, Rivera SJ, Durresi A, Babbar-Sebens M. Trust-Based game-theoretical decision making for food-energy-water management. In: Barolli L, Hellinckx P, Enokido T, editors. Advances on broad-band wireless computing, communication and applications, vol. 97. Berlin: Springer; 2020. p. 125–36; https://doi.org/10.1007/978-3-030-33506-9_12.
Chapter Google Scholar
Uslu S, Kaur D, Rivera SJ, Durresi A, Babbar-Sebens M. Trust-based decision making for food-energy-water actors. In: Barolli L, Amato F, Moscato F, Enokido T, Takizawa M, editors. Advanced information networking and applications. Berlin: Springer International Publishing; 2020. p. 591–602.
Chapter Google Scholar
Vabalas A, Gowen E, Poliakoff E, Casson AJ. Machine learning algorithm validation with a limited sample size. PLoS ONE. 2019;14(11):e0224365. https://doi.org/10.1371/journal.pone.0224365.
Article Google Scholar
Validation of Machine Learning Models: Challenges and Alternatives. (2017). protiviti.
Varshney KR. Trustworthy machine learning and artificial intelligence. XRDS. 2019;25(3):26–9. https://doi.org/10.1145/3313109.
Article Google Scholar
Varshney KR (2020). On Mismatched Detection and Safe, Trustworthy Machine Learning. 2020 54th Annual Conference on Information Sciences and Systems (CISS), 1–4. https://doi.org/10.1109/CISS48834.2020.1570627767
Veeramachaneni K, Arnaldo I, Korrapati V, Bassias C, Li K. (2016). AI2: Training a Big Data Machine to Defend. 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), 49–54. https://doi.org/10.1109/BigDataSecurity-HPSC-IDS.2016.79
Vinze AS, Vogel DR, Nunamaker JF. Performance evaluation of a knowledge-based system. Inf Manag. 1991;21(4):225–35. https://doi.org/10.1016/0378-7206(91)90068-D.
Article Google Scholar
Volz V, Majchrzak K, Preuss M. (2018). A Social Science-based Approach to Explanations for (Game) AI. 2018 IEEE Conference on Computational Intelligence and Games (CIG), 1–2. https://doi.org/10.1109/CIG.2018.8490361
Wang D, Yang Q, Abdul A, Lim BY (2019). Designing Theory-Driven User-Centric Explainable AI. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems - CHI ’19, 1–15. https://doi.org/10.1145/3290605.3300831
Wei S, Zou Y, Zhang T, Zhang X, Wang W. Design and experimental validation of a cooperative adaptive cruise control system based on supervised reinforcement learning. Appl Sci. 2018;22:1014.
Article Google Scholar
Welch ML, McIntosh C, Traverso A, Wee L, Purdie TG, Dekker A, Haibe-Kains B, Jaffray DA. External validation and transfer learning of convolutional neural networks for computed tomography dental artifact classification. Phys Med Biol. 2020;65(3):035017. https://doi.org/10.1088/1361-6560/ab63ba.
Article Google Scholar
Wells SA (1993). The VIVA Method: A Life-cycle Independent Approach to KBS Validation. AAAI Technical Report WS-93–05, 5.
Wickramage N (2016). Quality assurance for data science: Making data science more scientific through engaging scientific method. 2016 Future Technologies Conference (FTC). https://doi.org/10.1109/FTC.2016.7821627
Wieringa M (2020). What to account for when accounting for algorithms: A systematic literature review on algorithmic accountability. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 1–18. https://doi.org/10.1145/3351095.3372833
Wing JM (2020). Trustworthy AI. ArXiv:2002.06276[Cs].
Winkel DJ. Validation of a fully automated liver segmentation algorithm using multi-scale deep reinforcement learning and comparison versus manual segmentation. Eur J Radiol. 2020;7:108918.
Article Google Scholar
Winkler T, Rinner B. (2010). User-Based Attestation for Trustworthy Visual Sensor Networks. 2010 IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing, 74–81. https://doi.org/10.1109/SUTC.2010.20
Wu C-H, Lee S-J. KJ3—A tool assisting formal validation of knowledge-based systems. Int J Hum Comput Stud. 2002;56(5):495–524. https://doi.org/10.1006/ijhc.2002.1007.
Article Google Scholar
Xiao Y, Pun C-M, Liu B. Adversarial example generation with adaptive gradient search for single and ensemble deep neural network. Inf Sci. 2020;528:147–67. https://doi.org/10.1016/j.ins.2020.04.022.
Article MathSciNet Google Scholar
Xu W, Evans D, Qi Y. (2018). Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. Proceedings 2018 Network and Distributed System Security Symposium. https://doi.org/10.14722/ndss.2018.23198
Yilmaz L. Validation and verification of social processes within agent-based computational organization models. Comput Math Organ Theory. 2006;12(4):283–312. https://doi.org/10.1007/s10588-006-8873-y.
Article MATH Google Scholar
Yoon J, Kim K, Jang J. (2019). Propagated Perturbation of Adversarial Attack for well-known CNNs: Empirical Study and its Explanation. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 4226–4234. https://doi.org/10.1109/ICCVW.2019.00520
Zaidi AK, Levis AH. Validation and verification of decision making rules. Automatica. 1997;33(2):155–69. https://doi.org/10.1016/S0005-1098(96)00165-3.
Article MathSciNet MATH Google Scholar
Zeigler BP, Nutaro JJ. Towards a framework for more robust validation and verification of simulation models for systems of systems. J Def Model Simul . 2016;13(1):3–16. https://doi.org/10.1177/1548512914568657.
Article Google Scholar
Zhou J, Chen F (2019). Towards Trustworthy Human-AI Teaming under Uncertainty. 5.
Zhu, H, Xiong, Z, Magill, S, Jagannathan, S. (2019). An inductive synthesis framework for verifiable reinforcement learning. Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 686–701. https://doi.org/10.1145/3314221.3314638
Zhu J, Liapis A, Risi S, Bidarra R, Youngblood GM (2018). Explainable AI for Designers: A Human-Centered Perspective on Mixed-Initiative Co-Creation. 2018 IEEE Conference on Computational Intelligence and Games (CIG), 1–8. https://doi.org/10.1109/CIG.2018.8490433
Zlatareva NP (1998). Knowledge Refinement during Developmental and Field Validation of Expert Systems. 6.
Zlatareva N, Preece A. State of the art in automated validation of knowledge-based systems. Expert Syst Appl. 1994;7(2):151–67. https://doi.org/10.1016/0957-4174(94)90034-5.
Article Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This work was supported by the Commonwealth Cyber Initiative (CCI).

Author information

Authors and Affiliations

The Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University (Virginia Tech), Arlington, VA, 22203, USA
Feras A. Batarseh
Department of Statistics, Virginia Polytechnic Institute and State University (Virginia Tech), Arlington, VA, 22203, USA
Laura Freeman
College of Science, George Mason University, Fairfax, 22030, USA
Chih-Hao Huang

Authors

Feras A. Batarseh
View author publications
You can also search for this author in PubMed Google Scholar
Laura Freeman
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Hao Huang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

FB designed the study, provided the new assurance definition, developed the visualizations, and led the effort in writing the paper; LF reviewed the paper and provided consultation on the topic; CH developed the tables and the scoring system, and worked on finding, arranging, and managing the papers used in the review. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Feras A. Batarseh.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: All manuscripts and their detailed scores by ranking category

Year	Author	AI.s	R	M	Rs	Ds	Sz	Sc	L	G	A	C
1985	Ginsberg	KBS	1	1	1	1	1	1	0	0	1	0
1987	Castore	KBS	1	1	0	0	0	0	1	0	1	0
1987	Culbert	KBS	1	0	0	0	0	0	1	0	1	0
1987	O'Keefe	KBS	1	0	0	0	0	0	0	1	0	1
1988	Morell	KBS	1	1	1	0	0	1	1	0	0	0
1988	Sargent	ABS	1	1	0	0	0	0	0	1	0	1
1989	Becker	KBS	1	1	1	0	0	1	0	0	1	0
1990	Suen	KBS	1	1	1	0	0	0	0	0	0	0
1991	Vinze	KBS	1	1	1	0	0	1	0	0	0	0
1991	Gilstrap	KBS	1	1	0	0	0	0	0	1	0	0
1991	Raedt	KBS	1	1	0	0	0	0	0	0	0	1
1992	Andert	KBS	1	1	1	0	0	1	0	0	0	1
1992	Preece	KBS	1	1	1	0	0	0	0	0	0	1
1992	Davis	ABS	1	1	0	0	0	0	0	0	0	1
1993	Wells	KBS	1	1	0	0	0	1	0	1	0	0
1993	Mengshoel	KBS	1	1	0	0	0	1	0	0	0	0
1994	Zlatareva	KBS	1	1	1	0	0	1	0	0	1	0
1994	Lee	KBS	1	1	0	0	0	0	0	0	0	1
1995	Lockwood	KBS	1	1	1	0	0	1	0	0	1	0
1995	Singer	DS	1	1	1	1	0	1	0	0	0	0
1996	Martin	DS	0	1	1	0	0	1	0	1	1	1
1996	Carley	KBS	1	0	0	0	0	0	0	1	0	1
1996	Gonzalez	KBS	1	1	0	0	0	1	0	0	0	0
1996	Abel	KBS	1	1	0	0	0	0	0	0	0	0
1996	Nourani	Generic	1	1	0	0	0	0	0	0	0	0
1997	Jones	GA	0	1	1	0	0	1	0	0	1	1
1997	Abel	KBS	1	1	1	0	0	0	0	0	0	0
1997	Harmelen	KBS	1	1	0	0	0	0	0	0	0	1
1997	Lowry	Generic	1	1	0	0	0	0	0	0	1	0
1997	Murrell	KBS	1	0	0	0	0	0	0	1	0	1
1997	Zaidi	KBS	1	1	0	0	0	1	0	0	0	0
1998	Miller	GA	1	1	1	0	0	1	0	0	0	0
1998	Zlatareva	KBS	1	1	1	0	0	1	0	0	0	0
1998	Antoniou	KBS	0	1	0	0	0	0	0	0	0	1
1999	Thomas	GA	1	1	1	1	1	1	0	0	0	1
1999	Tsai	KBS	1	1	0	0	0	0	0	0	1	1
1999	Hailu	RL	0	1	1	0	0	0	0	0	0	0
2000	Mosqueira-Rey	KBS	1	1	1	0	0	1	0	1	1	0
2000	Marcos	KBS	1	0	1	0	0	1	0	1	1	0
2000	Onoyama	KBS	1	1	1	0	0	1	0	0	1	0
2000	Edwards	DS	1	0	0	0	0	0	0	1	1	1
2000	Coenen	KBS	0	0	0	0	0	0	0	1	1	1
2000	Gonzalez	Generic	0	0	0	0	0	0	0	1	0	0
2001	Berndt	DS	1	1	1	1	1	1	0	0	1	0
2001	Halkidi	ML	1	1	1	0	0	1	0	0	1	1
2001	Barr	NLP	0	0	0	0	0	0	0	0	1	0
2002	Wu	KBS	1	1	0	0	0	1	1	0	1	1
2002	Knauf	KBS	1	1	0	0	0	1	0	0	0	0
2003	Schumann	DL	1	1	1	0	0	0	0	0	1	1
2003	Taylor	DL	1	1	0	0	0	0	0	1	0	1
2003	Kurd	DL	0	0	0	0	0	0	0	1	0	0
2004	Liu	ABS	1	0	0	0	0	0	0	1	0	1
2004	Sargent	ABS	1	0	0	0	0	0	0	1	0	1
2005	Liu	DL	1	1	1	1	0	1	0	0	0	1
2005	Menzies	Generic	1	1	0	0	0	0	0	1	0	0
2005	Taylor	DL	1	1	0	0	0	0	0	0	0	0
2006	Forster	AGI	1	1	1	1	1	1	0	0	0	1
2006	Taylor	DL	1	0	1	0	0	0	0	1	1	1
2006	Yilmaz	ABS	1	1	0	0	0	0	0	0	0	1
2006	Min	KBS	1	1	0	0	0	0	0	0	0	0
2006	Dibie-Barthélemy	KBS	0	0	0	0	0	0	0	0	0	1
2007	Martín-Guerrero	RL	1	1	1	1	0	1	0	0	1	0
2007	Knauf	KBS	1	1	1	1	0	0	0	0	0	1
2007	Brancovici	XAI	1	1	0	0	0	0	0	0	1	1
2007	Min	KBS	1	1	0	0	0	0	0	0	1	0
2008	Peppler	DS	1	1	1	1	1	1	1	0	1	0
2008	Liu	ABS	1	1	0	0	0	0	0	0	0	0
2009	Tadj	KBS	1	1	1	1	0	1	1	0	0	1
2009	Hibbard	AGI	0	1	1	0	0	1	0	0	0	0
2009	Pèpe	GA	0	1	1	0	0	0	0	0	0	0
2010	Bone	RL	1	1	1	1	1	1	0	0	1	0
2010	Winkler	CV	1	1	1	0	0	1	0	0	1	1
2010	Dong	GA	1	1	1	0	0	1	0	0	0	1
2010	Niazi	ABS	1	1	1	1	0	1	0	0	0	0
2010	Sargent	ABS	0	0	0	0	0	0	0	1	0	1
2010	Schumann	DL	0	0	0	0	0	0	0	0	0	1
2012	Cohen	NLP	0	1	1	1	0	1	0	0	0	0
2012	Martinez-Balleste	CV	1	0	0	0	0	0	0	0	1	1
2013	Batarseh	KBS	1	1	1	1	1	1	0	0	1	0
2013	Sargent	ABS	1	1	1	0	0	0	0	0	1	1
2013	David	ABS	1	1	0	0	0	0	0	1	0	1
2013	Pitchforth	DL	1	1	0	0	0	0	0	0	0	0
2014	Ali	DS	1	1	1	1	1	1	1	0	0	1
2015	He	GA	1	1	1	0	0	1	1	0	1	1
2015	Rovcanin	RL	1	1	1	1	0	1	0	0	1	1
2015	Goodfellow	ML	1	1	1	0	0	1	0	0	0	1
2015	Arifin	ABS	1	0	0	0	0	0	0	1	1	1
2015	Batarseh	KBS	1	0	0	0	0	0	0	1	1	1
2015	Dobson	ML	1	0	0	0	0	0	0	0	0	1
2016	Veeramachaneni	DS	1	1	1	1	1	1	1	1	1	0
2016	Gao	DS	1	0	1	1	1	1	0	1	1	1
2016	Gulshan	CV	1	1	1	1	1	1	0	0	1	1
2016	Heaney	GA	1	1	1	1	1	1	1	0	1	0
2016	Aitken	ABS	1	1	1	1	1	1	0	0	0	1
2016	Celis	ML	1	1	1	1	1	1	0	0	1	0
2016	Jiang	RL	0	1	1	1	1	1	0	0	1	1
2016	Kianifar	GA	1	1	1	1	1	1	0	0	1	0
2016	Jilk	ABS	1	0	1	0	0	1	0	1	0	0
2016	Amodei	ML	1	0	0	0	0	0	0	1	0	1
2016	Aitken	ABS	1	1	0	0	0	0	0	0	0	0
2016	Janssen	DS	0	0	0	0	0	0	0	1	0	1
2016	Zeigler	ABS	1	1	0	0	0	0	0	0	0	0
2016	Wickramage	DS	1	0	0	0	0	0	0	0	0	0
2017	Liang	DS	1	1	1	1	0	1	0	0	1	1
2017	Xu	DL	1	1	1	1	1	1	0	0	0	1
2017	Mason	RL	1	1	1	0	0	1	0	0	0	1
2017	Leibovici	DS	1	1	0	0	0	0	0	1	0	1
2017	Laat	ML	0	1	0	0	0	0	0	1	0	1
2017	Mason	RL	1	1	0	0	0	0	0	0	0	1
2017	Lepri	ML	0	0	0	0	0	0	0	0	0	1
2018	Wei	RL	1	1	1	1	1	1	1	0	1	1
2018	Alves	ABS	1	1	1	1	1	1	0	0	1	1
2018	Elsayed	CV	1	1	1	1	1	1	0	0	0	1
2018	Go	DL	1	1	1	1	1	1	0	0	1	0
2018	Mason	RL	1	1	1	0	0	1	1	1	0	1
2018	Murray	XAI	1	1	1	1	1	1	0	0	0	1
2018	Pynadath	ML	1	1	1	0	0	1	1	0	1	1
2018	Stock	CV	1	1	1	1	1	1	0	0	1	0
2018	Cao	ML	1	1	1	1	0	1	0	0	0	1
2018	Ruan	DL	1	0	1	1	0	1	1	0	0	1
2018	Antunes	ML	1	1	1	0	0	1	1	0	0	0
2018	Volz	XAI	0	1	1	0	0	1	0	0	1	1
2018	AI Now	XAI	1	1	0	0	0	0	0	1	1	0
2018	Došilović	ML	1	0	0	0	0	0	0	1	1	1
2018	EY	ML	1	0	0	0	0	0	0	1	1	1
2018	Guidotti	XAI	1	0	0	0	0	0	0	1	1	1
2018	Zhu	XAI	1	0	0	0	0	0	0	1	1	1
2018	Abdollahi	ML	1	0	0	0	0	0	0	1	0	1
2018	Adadi	XAI	1	0	0	0	0	0	0	1	0	1
2018	Agarwal	Generic	1	1	0	0	0	0	0	0	0	1
2018	Everitt	AGI	1	0	0	0	0	0	0	1	0	1
2018	Bride	XAI	1	1	0	0	0	0	0	0	0	0
2018	Hagras	XAI	1	0	0	0	0	0	0	0	0	1
2018	Kaul	ML	1	0	0	0	0	0	0	0	0	0
2018	Mehri	DL	0	0	0	0	0	0	0	0	0	1
2018	Sileno	XAI	1	0	0	0	0	0	0	0	0	0
2019	Tao	Generic	1	1	1	1	1	1	1	1	1	1
2019	Kaur	XAI	1	1	1	1	1	1	1	0	1	1
2019	Batarseh	DS	1	1	1	1	1	1	0	0	1	1
2019	Huber	RL	1	1	1	1	1	1	1	0	0	1
2019	Keneni	XAI	1	1	1	1	1	1	1	0	1	0
2019	Maloca	DL	1	1	1	1	1	1	0	0	1	1
2019	Barredo-Arrieta	XAI	1	1	1	1	1	0	0	1	1	0
2019	Chittajallu	XAI	1	1	1	1	0	1	1	0	0	1
2019	Ferreyra	XAI	1	1	1	0	0	1	0	1	1	1
2019	Lee	XAI	1	1	1	1	1	1	0	0	0	1
2019	Naqa	ML	1	1	1	1	0	1	0	0	1	1
2019	Prentzas	XAI	1	1	1	1	1	1	0	0	1	0
2019	Bellamy	XAI	1	1	1	0	0	1	1	1	0	0
2019	Beyret	RL	1	1	1	0	0	1	1	0	1	0
2019	Madumal	RL	1	1	1	0	0	1	0	0	1	1
2019	Schlegel	XAI	1	1	1	1	1	1	0	0	0	0
2019	Vabalas	ML	1	1	1	1	0	1	0	0	1	0
2019	Zhu	RL	1	1	1	0	0	1	1	0	0	1
2019	Chen	RL	1	1	0	0	0	0	0	1	1	1
2019	Cruz	RL	1	1	1	0	0	1	0	0	0	1
2019	Dupuis	XAI	1	1	1	0	0	1	0	0	0	1
2019	Joo	RL	1	1	1	0	0	1	1	0	0	0
2019	Ren	DL	0	1	1	0	0	1	1	0	1	0
2019	Srivastava	NLP	1	1	1	0	0	1	0	0	0	1
2019	Uslu	XAI	1	1	1	0	0	1	0	0	0	1
2019	Yoon	XAI	1	1	1	0	0	1	0	0	0	1
2019	Zhou	ML	1	1	1	0	0	1	0	0	0	1
2019	Mehrabi	ML	1	0	0	0	0	0	0	1	1	1
2019	Meskauskas	XAI	1	1	1	0	0	1	0	0	0	0
2019	Nassar	XAI	1	1	0	0	0	0	0	0	1	1
2019	Qiu	Generic	1	0	0	0	0	0	0	1	1	1
2019	Wang	XAI	1	1	0	0	0	0	0	1	0	1
2019	Breck	ML	1	1	0	0	0	0	0	0	1	0
2019	Glomsrud	XAI	1	0	0	0	0	0	0	0	1	1
2019	He	DL	1	0	0	0	0	0	0	1	0	1
2019	Israelsen	Generic	1	0	0	0	0	0	0	1	0	1
2019	Jha	DL	1	0	1	0	0	0	0	0	0	1
2019	Sun	XAI	0	1	1	0	0	1	0	0	0	0
2019	Dghaym	XAI	0	0	0	0	0	0	0	0	1	1
2019	Mueller	XAI	1	0	0	0	0	0	0	0	0	1
2019	Protiviti	ML	0	0	0	0	0	0	0	1	0	1
2019	Spada	XAI	0	1	0	0	0	0	0	0	1	0
2019	Pocius	RL	1	0	0	0	0	0	0	0	0	0
2019	Rossi	XAI	1	0	0	0	0	0	0	0	0	0
2019	Varshney	ML	0	0	0	0	0	0	0	1	0	0
2020	D'Alterio	XAI	1	1	1	1	1	1	1	1	1	1
2020	Anderson	RL	1	1	1	1	1	1	0	1	1	1
2020	Birkenbihl	ML	1	1	1	1	1	1	1	0	1	1
2020	Checco	DS	1	1	1	1	1	1	0	1	1	1
2020	Chen	XAI	1	1	1	1	1	1	1	0	1	1
2020	EASA	DL	1	1	1	1	1	1	0	1	1	1
2020	Kulkarni	DS	1	1	1	1	1	1	0	1	1	1
2020	Kuppa	XAI	1	1	1	1	1	1	1	0	1	1
2020	Kuzlu	XAI	1	1	1	1	1	1	0	1	1	1
2020	Spinner	XAI	1	1	1	1	1	1	0	1	1	1
2020	Winkel	RL	1	1	1	1	1	1	1	0	1	1
2020	Gardiner	ML	1	1	1	1	1	1	0	0	1	1
2020	Guo	XAI	1	1	1	1	1	1	1	0	1	0
2020	Han	XAI	1	1	1	1	1	1	0	0	1	1
2020	Kohlbrenner	XAI	1	1	1	1	1	1	1	0	0	1
2020	Malolan	XAI	1	1	1	1	1	1	1	0	1	0
2020	Payrovnaziri	ML	1	1	0	1	1	0	1	1	1	1
2020	Sequeira	RL	1	1	1	1	1	1	0	0	1	1
2020	Sivamani	DL	1	1	1	1	1	1	0	0	1	1
2020	Tan	XAI	1	1	1	1	1	1	0	0	1	1
2020	Tao	XAI	1	1	1	1	1	1	1	0	1	0
2020	Welch	DL	1	1	1	1	1	1	0	0	1	1
2020	Xiao	DL	1	1	1	1	1	1	0	0	1	1
2020	Halliwell	DL	1	1	1	1	1	1	0	0	0	1
2020	Heuer	ML	1	1	1	1	1	1	0	0	0	1
2020	Kaur	XAI	1	1	1	1	1	1	0	0	1	0
2020	Mackowiak	CV	1	1	1	1	0	1	0	0	1	1
2020	Ragot	ML	1	1	1	1	1	1	0	0	1	0
2020	Rotman	RL	1	1	1	1	1	0	0	0	1	1
2020	Sarathy	XAI	1	1	1	1	0	1	0	0	1	1
2020	Uslu	XAI	0	1	1	1	1	1	0	0	1	1
2020	Cruz	RL	1	1	1	0	0	1	1	0	0	1
2020	He	RL	1	1	1	1	1	0	0	0	1	0
2020	Islam	XAI	0	1	1	1	1	0	0	0	1	1
2020	Mynuddin	RL	1	1	1	1	0	1	0	0	1	0
2020	Puiutta	RL	1	1	0	0	0	0	1	1	1	1
2020	Toreini	ML	1	0	0	0	1	1	1	1	0	1
2020	Toreini	ML	1	0	0	0	1	1	1	1	0	1
2020	Diallo	XAI	1	1	1	1	0	1	0	0	0	0
2020	Guo	XAI	1	0	0	0	0	1	0	1	1	1
2020	Haverinen	XAI	0	1	1	1	0	1	0	0	0	1
2020	Katell	XAI	1	1	0	0	0	0	1	0	1	1
2020	Murray	XAI	0	1	1	1	1	0	0	0	1	0
2020	Taylor	XAI	1	1	1	1	1	0	0	0	0	0
2020	Tjoa	ML	1	0	0	0	1	0	1	1	0	1
2020	Varshney	ML	1	1	0	0	0	0	1	1	0	1
2020	Wieringa	XAI	1	0	0	0	1	0	1	1	0	1
2020	Wing	ML	1	0	0	0	1	0	1	1	0	1
2020	Das	XAI	1	0	0	0	0	0	1	1	0	1
2020	Li	XAI	0	0	0	0	1	0	1	1	0	1
2020	Dağlarli	XAI	1	0	0	0	0	0	0	1	0	1
2020	Dodge	XAI	1	1	0	0	0	0	0	0	0	1
2020	Heuillet	RL	1	0	0	0	0	0	0	1	0	1
2020	Martinez-Fernandez	XAI	1	1	0	0	0	0	0	0	1	0
2020	Putzer	XAI	0	1	0	0	0	0	0	0	1	1
2020	Raji	XAI	1	0	0	0	0	0	0	1	1	0
2020	Arrieta	XAI	1	0	0	0	0	0	0	0	0	1
2020	He	XAI	1	0	0	0	0	0	0	1	0	0
2020	Kaur	XAI	1	0	0	0	0	0	0	0	0	1
2020	Pawar	XAI	0	1	0	0	0	0	0	0	0	1
2020	Brennen	XAI	0	0	0	0	0	0	0	1	0	0
2020	European Commission	XAI	0	0	0	0	0	0	0	1	0	0
2021	Massoli	DL	1	1	1	1	1	1	1	0	1	1

Columns: AI subarea: AI.s; Relevance: R; Method: M; Results: Rs; Dataset: Ds; Size: Sz; Success: Sc; Limitations: L; General: G; Application: A; Comparison: C

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Batarseh, F.A., Freeman, L. & Huang, CH. A survey on artificial intelligence assurance. J Big Data 8, 60 (2021). https://doi.org/10.1186/s40537-021-00445-7

Download citation

Received: 13 January 2021
Accepted: 22 March 2021
Published: 26 April 2021
DOI: https://doi.org/10.1186/s40537-021-00445-7

A survey on artificial intelligence assurance

Abstract

Introduction and survey structure

Relevant terminology and definitions

Description of included articles

AI assurance landscape

A historical perspective (analysis of the state-of-the-science)

The state of AI assurance

The review and scoring of assurance methods

Recommendations and the future of AI assurance

The need for AI assurance

Future components of AI assurance research

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Appendix: All manuscripts and their detailed scores by ranking category

Appendix: All manuscripts and their detailed scores by ranking category

Rights and permissions

About this article

Cite this article

Share this article

Keywords