Thank You!

You are attempting to access subscriber-restricted content.

Are You Ready to Experience Everything Internal Auditor (Ia) Has to Offer?

Framing AI Audits​

As more organizations implement artificial intelligence, internal auditors need a framework for reviewing these systems.

Comments Views

​Artificial intelligence (AI) is transforming business operations in myriad ways, from helping companies set product prices to extending credit based on customer behavior. Although still in its nascent stage, organizations are using AI to rank money-laundering schemes by degree of risk based on the nature of the transaction, according to a July EY analytics article. Others are leveraging AI to predict employee expense abuse based on the expense type and vendors involved. Small wonder that McKinsey & Company estimates that the technology could add $13 trillion per year in economic output worldwide by 2030.

If AI is not on internal audit's risk assessment radar now, it will be soon. As AI transitions from experimental to operational, organizations will increasingly use it to predict outcomes supporting management decision-making. Internal audit departments will need to provide management assurance that the predicted outcomes are reasonable by assessing AI risks and testing system controls.

Evolving Technology

AI uses two types of technologies for predictive analytics — static systems and machine learning. Static systems are relatively straightforward to audit, because with each system iteration, the predicted outcome will be consistent based on the datasets processed and the algorithm involved. If an algorithm is designed to add a column of numbers, it remains the same regardless of the number of rows in the column. Internal auditors normally test static systems by comparing the expected result to the actual result.

By contrast, there is no such thing as an expected result in machine learning systems. Results are based on probability rather than absolute correctness. For example, the results of a Google search that float to the top of the list are those that are most often selected in prior searches, reflecting the most-clicked links but not necessarily the preferred choice. Because the prediction is based on millions of previous searches, the probability is high — though not necessarily certain — that one of those top links is an acceptable choice.

Unlike static systems, the Google algorithm, itself, may evolve, resulting in potentially different outcomes for the same question when asked at different intervals. In machine learning, the system "learns" what the best prediction should be, and that prediction will be used in the next system iteration to establish a new set of outcome probabilities. The very unpredictability of the system output increases audit risk absent effective controls over the validity of the prediction. For that reason, internal auditors should consider a range of issues, risks, controls, and tests when providing assurance for an AI business system that uses machine learning for its predictions.

AI System Development

The proficiency and due professional care standards of the International Professional Practices Framework require internal auditors to understand AI concepts and terms, as well as the phases of development, when planning an AI audit (see "Three Phases of Development," right). Because data fuels these systems, auditors must understand AI approaches to data analysis, including their effect on the system algorithm and its precision in generating outcome probabilities.

Features define the kinds of data for a system that would generate the best outcome. If the system objective is to flag employee expense reports for review, the features selected would be those that help predict the highest payment risk. These could include the nature of the business expense, vendors and dollar amounts involved, day and time reported, employee position, prior transactions, management authorization, and budget impact. A data scientist with expertise in this business problem would set the confidence level and predictive values and then let the system learn which features best determine the expense reports to flag.

Labels represent data points that a system would use to name a past outcome. For instance, based on historical data, one of the labels for entertainment expenses might be "New York dinner theater on Saturday night." The system then would know such expenses were incurred for this purpose on that night in the past and would use this data point to predict likely expense reports that might require close review before payment.

Feature engineering delimits the features selected to a critical few. Rather than provide a correct solution to a given problem, such as which business expense reports contain errors or fraud, machine learning calculates the probability that a given outcome is correct. In this case, the system would calculate which expense reports are likely to contain the highest probability of errors or fraud based on the features selected. The system then would rank the outcomes in descending order of probability.

Machine learning involves merging selected features and outcome labels from diverse datasets to train a system to generate a model that will predict a relationship between a set of features and a given label. The resulting algorithm and model are then refined in the testing phase using additional datasets. This phase may consider hundreds of features at once to discover which features yield the highest outcome probability based on the assigned labels.

Feature engineering then deletes the number of system features to enhance the precision of the outcome probabilities. Based on the testing phase, for example, the nature of the expense, the dollar amounts involved, and the level of the employee's position may best indicate high-risk business expense reports requiring close review. During the production phase, as the system calculates the risk of errors and fraud in actual expense reports, it may modify the algorithm based on actual output probabilities to improve the accuracy of future predictions. Doing so would create continuous system learning not seen in static systems.

In AI system development, it is important for organizations to establish an effective control environment, including accountability for compliance with corporate policies. This environment also should comprise safeguards over user access to proprietary or sensitive data, and performance metrics to measure the quality of the system output and user acceptance of system results.

A Risk/Control Audit Framework

Training Phase

Considerations for adjusting the assessed level of AI audit risk include:

  • If system reviews are in place to evaluate training data modifications, deletions, or trimming, this condition should help prevent overfitting the training dataset to generate a desired result, reducing audit risk.
  • New AI systems may use datasets of existing systems for reasons of time and cost. Such datasets, however, may contain bias and not include the kinds of data needed to generate the best system outcomes, increasing audit risk.
  • AI datasets that consist of numerous data records should contain some errors. In fact, an error-free dataset would indicate a bad dataset, because the occurrence of errors should match the natural rate. For example, if 5% of employee expense reports are filled in incorrectly and are missing key data, then the training dataset should contain a similar frequency. If not, then audit risk increases. ​

Nine procedures frame the audit of an AI system during the training, testing, and production phases of development. The framework provides a point of departure for AI audit planning and execution. Assessed risk drives the controls expected and subsequent internal auditor testing.

Internal auditors may need to adjust the procedures based on their preliminary survey of the AI system under audit, including a documented understanding of the system development process and an analysis of the relevant system risks and controls. Moreover, as auditors complete and document more of these audits, it may be necessary to adjust the framework.

Normally, internal auditors adjust their assessment of risk and their resulting audit project plans based on observations made in the preliminary audit survey. The boxes, at right, depict conditions that may alter assessed risk as well as modify expected AI system controls and subsequent audit testing during specific phases of development.

Data Bias (Training Phase) Use of datasets that are not representative of the true population may create bias in the system predictions. Bias risk also can result from failing to provide appropriate examples for the system application.

A control for data bias is to establish a system review and approval process to ensure there are verifiable datasets and system probabilities that represent the actual data conditions expected over the life of the system. Audit tests of control include ensuring that:

  • Qualified data scientists have judged the datasets.
  • The confidence level and predictive values are reasonable given the data domain.
  • Overfitting has not biased system predictions.

Data Recycling (Training) This risk can happen when developers recycle the wrong datasets for a new application, or impair the performance or maintenance of existing systems by using those datasets to create or update a new application.

One control for data bias is independently examining repurposed data for compliance with contractual or other requirements. In addition, organizations can determine whether adjustments in the repurposed data have been made without impacting other applications.

Examples of control tests are:

  • Evaluating the nature, timing, and extent of the independent examinations.
  • Testing the records of other applications for performance or maintenance issues that stem from the mutually shared datasets.

Data Origin (Training) Unauthorized or inappropriately sourced datasets can increase the risk of irrelevant, inaccurate, or incomplete system predictions during the production phase.

To control this risk, the organization should inspect datasets for origin and relevance, as well as compliance with contractual agreements, company protocols, or usage restrictions. The results of these inspections should be documented.

To test controls, auditors should:

  • Review data source agreements to ensure use of datasets is consistent with contract terms and company policy.
  • Examine the quality of the inspection reports, focusing on the propriety of data trimmed from the datasets.

Testing Phase

Considerations for adjusting the assessed level of AI audit risk include:

  • If independent, third-party judges tested the system data, but no process is in place to reconcile differences in test results between judges, then audit risk increases.
  • Because system predictions are based on probability, perfect test results are not possible. If third-party judges evaluating the test results find no issues, then data overfit may have occurred, increasing audit risk.
  • If the system has not been validated to prevent user misinterpretations caused by incorrect data relationships, such as flagging business expense reports based on employee gender, then audit risk increases. Alternatively, if user interpretations based on system predictions have not been validated to ensure system data supports the interpretation, then audit risk also increases.
  • If data scientists fail to use representative datasets with examples involving critical scenarios to train the system, then audit risk increases.
  • If the datasets are not locked during testing, then the data scientist may adjust the algorithm to inadvertently process the data in a biased manner, increasing audit risk.
  • If the datasets are locked during testing, but the data scientist fails to review the actual system prediction for integrity, then audit risk increases.

Data Conclusion (Testing Phase) Inappropriately tested data relationships could result in improper system conclusions that are based on incorrect assumptions about the data. These conclusions could create bias in management decisions.

The control for this risk is to ensure each feature of the system contains data for which the purpose has been approved for use. Developers should assess the results of such data for misinterpretation and correct it, as appropriate.

Testing this control involves reviewing user interpretations and subsequent management decisions based on system predictions. By performing this test, organizations can ensure that the data supports the conclusions reached and decisions made by management.

Data Overfit (Testing) With this issue, the risk is that datasets may not reflect the actual data domain. Specifically, data outliers may have been trimmed during system testing, leading to a condition that overfits the algorithm to a biased dataset. That could cause the system to respond poorly during the production phase.

Organizations can control for this risk by validating datasets in system testing to ensure that the samples used represent all possible scenarios and that the datasets were modified appropriately to obtain the currently desired system outcome.

To test this control, internal auditors should review all outlier, rejected, or trimmed data to ensure that:

  • Relevant data has not been trimmed from datasets.
  • Datasets remain locked throughout testing.
  • The algorithm has processed the data in an unbiased way.

Data Validation (Testing) Failure to validate datasets for integrity through automated systems or independent, third-party judges can lead to unsupported management decisions or regulatory violations. An example would be allowing the personal data of European Union (EU) citizens to be accessed outside of the EU in violation of Europe's General Data Protection Regulation.

Organizations can control for this risk by implementing a validation process that compares datasets to the underlying source data. If the organization uses automated systems, it should ensure the process reveals all underlying issues affecting the quality of the system output. If the organization uses independent, third-party judges, it should ensure the process allows judges the access they need to the raw data inputs and outputs.

To test these controls, internal auditors should:

  • Assess the process and conditions under which the validation took place, assuring that all high-risk datasets used in the system were validated.
  • Confirm randomly selected datasets with underlying source data.
  • When datasets are based on current system data, validate such data is correct to avert a flawed assessment of actual system data.

Data Processing (Production Phase) Failing to validate internal systems processing can cause inconsistent, incomplete, or incorrect reporting output and user decisions. However, periodically reviewing and validating input and output data at critical points in the data pipeline can mitigate this risk and ensure processing is in accordance with the system design.

Auditors can test this control by:

  • Reconstructing selected data output from the same data input to validate system outcomes.
  • Performing the system operation again.
  • Using the results to reassess system risk.

Data Performance (Production) If there is a lack of performance metrics to assess the quality of system output, the organization will fail to detect issues that diminish user acceptance of system results. For example, an AI system could fail to address government tax or environmental regulations over business activity.

Controlling data performance risk requires organizations to establish metrics to evaluate system performance in both the training and production phases. Such metrics should include the nature and extent of false positives, false negatives, and missed items. In addition, developers should implement a feedback loop for users to report system errors directly, among other performance measures.

Production Phase

Considerations for adjusting the assessed level of AI audit risk include:

  • Systems that leverage the datasets of existing systems already audited should lower overall audit risk and not require as much audit testing as new systems using datasets not previously audited.
  • Systems that process inputs and outputs at all stages of the data pipeline should facilitate validation of system-supported user decisions and lower overall audit risk. However, if data inputs and outputs are processed in a black-box environment, confirming internal system operations may not be possible. That would increase the audit risk of drawing the wrong conclusion about the reasonableness of the system output.
  • If performance metrics are used to measure the quality of the data output, user acceptance of system results, and system compliance with government regulations, then audit risk decreases.
  • If performance metrics monitor both system training and production data, then audit risk decreases.
  • If performance metrics measure system accuracy but not precision, overlooking a possible system performance issue, then audit risk increases.
  • Well-designed systems prevent unauthorized access to system data based on company protocols and regulatory requirements and routinely monitor access for security breaches, decreasing audit risk.

To test these controls, internal auditors should:

  • Examine reported variances from established performance measures.
  • Test a representative sample of performance variances to confirm whether management's follow-up or corrective action was appropriate.
  • Determine whether such action has enhanced user acceptance of system results.

Data Sensitivity (Production) With this issue, the risk is unauthorized access to personally identifiable information or other sensitive data that violates regulatory requirements. Controls include ensuring documented procedures are in place that restrict system access to authorized users. Additionally, ongoing monitoring for compliance is needed. Control testing includes:

  • Comparing system access logs to a documented list of authorized users.
  • Notifying management about audit exceptions.

Algorithmic Accountability

As AI technology matures, algorithmic bias in AI systems and lack of consumer privacy have raised ethical concerns for business leaders, politicians, and regulators. Nearly one-third of CEO respondents ranked AI ethics risk as one of their top three AI concerns, according to Deloitte's 2018 State of AI and Intelligent Automation in Business Survey.

What's more, the U.S. Federal Trade Commission (FTC) addressed hidden bias in training datasets and algorithms and its effect on consumers in a 2016 report, Big Data: A Tool for Inclusion or Exclusion? Such bias could have unintended consequences on consumer access to credit, insurance, and employment, the report notes. A recent U.S. Senate bill, the Algorithmic Accountability Act of 2019, would direct the FTC to require large companies to audit their AI algorithms for bias and their datasets for privacy issues, as well as correct them. If enacted, this legislation would impact the way in which such systems are developed and validated.

Given these developments, the master audit plan of many organizations could go beyond rendering assurance on AI system integrity to evaluating compliance with new regulations. Internal auditors also may need to provide the ethical conscience to the business leaders responsible for detecting and eliminating AI system bias, much as they do for the governance of financial reporting controls.

These responsibilities may make it harder for internal audit to navigate the path to effective AI system auditing. Yet, those departments that embark on the journey may be rewarded by improved AI system integrity and enhanced professional responsibility.

To learn more about AI, read "Getting to Know Common AI Terms."

Dennis Applegate
Mike Koenig
Internal Auditor is pleased to provide you an opportunity to share your thoughts about the articles posted on this site. Some comments may be reprinted elsewhere, online or offline. We encourage lively, open discussion and only ask that you refrain from personal comments and remarks that are off topic. Internal Auditor reserves the right to remove comments.

About the Authors



Dennis ApplegateDennis Applegate<p>Dennis Applegate, CIA, CPA, CMA, CFE, is a lecturer in internal auditing and accounting at Seattle University and served on the management team of Boeing Corporate Audit for 20 years.<br></p>



Mike KoenigMike Koenig<p>Mike Koenig is a lecturer in computer science at Seattle University and was a software engineering and AI leader at Microsoft for 25 years, with 19 patents.​</p>


Comment on this article

comments powered by Disqus
  • AuditBoard_Pandemic_May 2020_Premium 1_
  • Galvanize_May 2020_Premium 2
  • IIA CERT-Online Proctering_May 2020_Premium 3