By now, most internal audit functions have likely implemented rule-based analytics capabilities to evaluate controls or identify data irregularities. While these tools have served the profession well, providing useful insights and enhanced stakeholder assurance, emerging technologies can deliver even greater value and increase audit effectiveness. With the proliferation of digitization and wealth of data generated by modern business processes, now is an opportune time to extend beyond our well-worn approaches.
In particular, machine learning (ML) algorithms represent a natural evolution beyond rule-based analysis. Internal audit functions that incorporate ML beyond their existing toolkit can expect to develop new capabilities to predict potential outcomes, identify patterns within data, and generate insight difficult to achieve through rudimentary data analysis. Those looking to get started should first understand common ML concepts, how ML can be applied to audit work, and the challenges likely to arise along the way.
What Is Machine Learning?
ML is a branch of artificial intelligence (AI) featuring algorithms that learn from past patterns and examples to perform a specific task. How does an ML algorithm "learn," and how is this different from rule-based systems? Rule-based systems generate an outcome by evaluating specific conditions — for example, "If it is raining, carry an umbrella." These systems can be automated — such as through the use of robotic process automation — but they are still considered "dumb" and incapable of processing inputs unless provided explicit instructions.
By contrast, an ML model generates probable outcomes for "Should I carry an umbrella?" by taking into account inputs such as temperature, humidity, and wind and combining these with data on prior outcomes from when it rained and when it did not. Machine learning can even consider the user's schedule for the day to determine if he or she will likely be outdoors when rain is predicted. With ML models, the best predictor of future behavior is past behavior. Such systems can generate useful real-world insights and predictions by inferring from past examples.
As an analogy, most people who have built objects using a Lego set, such as a car, follow a series of rules — a step-by-step instruction manual included with the construction toys. After building the same Lego car many times, even without written instructions, an individual would acquire a reasonable sense of how to build a similar car given the Lego parts. Likewise, an ML algorithm with sufficient training — prior practice assembling the Lego car — can provide useful outcomes (build the same car) and identify patterns (relationships between the Lego parts) given an unknown set of inputs (previously unseen Lego parts) even without instructions.
The outcomes and accuracy of ML algorithms are highly dependent on the inputs provided to them. A conceptual grasp of ML processes hinges on understanding these inputs and how they impact algorithm effectiveness.
Feature Put simply, a feature is an input to a model. In an Excel table populated with data, one data column represents a single feature. The number of features, also referred to as the dimensionality of the data, varies depending on the problem and can range up to the hundreds. If a model is developed to predict the weather, data such as temperature, pressure, humidity, types of clouds, and wind conditions comprise the model's features. ML algorithms are well-suited to such multidimensional analysis of data.
Feature Engineering In a rule-based system, an expert will create rules to determine the outcome. In an ML model, an expert selects the specific features from which the model will learn. This selection process is known as feature engineering, and it represents an important step toward increasing the algorithm's precision and efficiency. The expert also can refine the selection of inputs by comparing the outcomes of different input combinations. Effective feature engineering should reduce the number of features within the training data to just those that are important. This process will allow the model to generalize better, with fewer assumptions and reduced bias.
Label An ML model can be trained using past outcomes from historical data. These outcomes are identified as labels. For instance, in a weather prediction model, one of the labels for a historical input date might be "rained with high humidity." The ML model will then know that it rained in the past, based on the various temperature, pressure, humidity, cloud, and wind conditions on a particular day, and it will use this as a data point to help predict the future.
Ensemble Learning One common way to improve model accuracy is to incorporate the results of multiple algorithms. This "ensemble model" combines the predicted outcomes from the selected algorithms and calculates the final outcome using the relative weight assigned to each one.
Learning Categories The way in which an ML algorithm learns can generally be separated into two broad categories — supervised and unsupervised. Which type might work best depends on the problem at hand and the availability of labels.
supervised learning algorithm learns by analyzing defined features and labels in what is commonly called the training dataset. By analyzing the training dataset, the model learns the relationship between the defined features and past outcomes (labels). The resulting supervised learning model can then be applied to new datasets to obtain predicted results. To assess its precision, the algorithm will be used to predict the outcomes from a testing dataset that is distinct from the training dataset. Based on the results of this training and testing regime, the model can be fine-tuned through feature engineering until it achieves an acceptable level of accuracy.
- Unlike supervised learning,
unsupervised learning algorithms do not have past outcomes from which to learn. Instead, an unsupervised learning algorithm tries to group inputs according to the similarities, patterns, and differences in their features without the assistance of labels. Unsupervised learning can be useful when labeled data is expensive or unavailable; it is effective at identifying patterns and outliers in multidimensional data that, to a person, may not be obvious.
An ML model's capacity to provide stronger assurance, compared to rule-based analysis, can be illustrated using an example of the technology's ability to identify anomalies in payment transactions. "Overview of ML Payment Analytics" (right) shows the phases of this process.
Developing an ML model to analyze payment transactions will first require access to diverse data sources, such as historical payment transactions for the last three years, details of external risk events (e.g., fraudulent payments), human resource (HR) data (e.g., terminations and staff movements), and details of payment counterparties. Before feature engineering work can start, the data needs to be combined and then reviewed to verify it is free of errors — commonly called the extract, transform, and load phase. During this phase, data is extracted from various source systems, converted (transformed) into a format that can be analyzed, and stored (loaded) in a data warehouse.
Next, the user performs feature engineering to shortlist the critical features — such as payment date, counterparty, and amount — the model will analyze. To refine the results, specific risk weights, ranging from 0 to 1, are assigned to each feature based on its relative importance. From experience, a real-world payment analytics model may use more than 150 features. The ability to perform such multidimensional analysis of features represents a key reason to use ML algorithms instead of simple rule-based systems.
To begin the analysis, internal auditors could apply an unsupervised learning algorithm that identifies payment patterns to specific counterparties, potentially fraudulent transactions, or payments with unusual attributes that warrant attention. The algorithm performs its analysis by identifying the combination of features that fit most payments and producing an anomaly score for each payment, depending on how its features differ from all others. It then derives a risk score for each payment from the risk weight and the anomaly score. This risk score indicates the probability of an irregular payment.
"Payment Outliers" (below right) illustrates a simple model using only three features, with two transactions identified as outliers. The unsupervised learning model generates a set of potential payment exceptions. These exceptions are followed up to determine if they are true or false. The results can then be used as labels to incorporate supervised learning into the ML model, enabling identification of improper payments with a significantly higher degree of precision.
Supervised learning models can also be used to predict the likelihood of specific outcomes. By training an algorithm using labels on historical payment errors, the model can help identify potential errors before they occur. For example, based on past events a model may learn that the frequency of erroneous payments is highly correlated with specific features, such as high frequency of payment, specific time of day, or staff attrition rates. A supervised learning model trained with these labels can be applied to future payments to provide an early warning for potential payment errors.
This anomaly detection model can be applied to datasets with clear groups, though it should not contain significant transactions that differ greatly from most of the data. For instance, the model can be extended to detect irregularities in almost any area, including expenses, procurement, and access granted to employees.
Continuing with the payment example, an ML model developed to analyze payment transactions can be used to uncover hidden patterns or unknown insights. Examples include:
- Identify overpayment for services by comparing the mean and typical variance in payment amounts for each product type — such as air tickets or IT services — and highlighting all payments that deviate significantly from the mean.
- Identify prior unknown emerging needs — such as different departments paying for a new service at significantly different prices — or client types by highlighting payment outliers. This insight could allow executives to optimize the cost for acquired products and services.
- Identify multiple consecutive payments to a single counterparty below a specific threshold. This analysis would help identify suspicious payments that have been split into smaller ones to potentially escape detection.
- Identify potential favoritism shown to specific vendors by pinpointing significant groups of payments made to these vendors or related entities.
Internal auditors are likely to encounter numerous challenges when applying ML technology. Input quality, biases and poor performance, and lack of experience with the technology are among the most common.
Availability of Clean, Labeled Data For any ML algorithm to provide meaningful results, a significant amount of high-quality data must be available for analysis. For instance, developing an effective payment anomaly detection model requires at least a year of transactional, HR, and counterparty information. Data cleansing, which involves correcting and removing erroneous or inaccurate input data, is often required before the algorithm can be trained effectively. Experience shows that data exploration and data preparation often consume the greatest amount of time in ML projects. Biases in the training data that are not representative of the actual environment will adversely impact the model's output. Also, without good labels — such as labels on actual cyber intrusions — and feature engineering, a supervised learning model will be biased toward certain outcomes and may generate noisy, or meaningless, results.
Poor Model Performance and Biases Most internal audit functions that embark on ML projects will initially receive disappointing or inaccurate results from at least some of their models. Potential sources of failure may include trained models that do not generalize well, poor feature engineering, use of algorithms that are ill-suited to the underlying data, or scarcity of good quality data.
Overfitting is another potential cause of poor model performance — and one that data scientists encounter often. An ML model that overfits generates outcomes that are biased toward the training dataset. To reduce such biases, internal audit functions use testing data independent of the training dataset to validate the model's accuracy.
Auditors should also be cognizant of each algorithm's inherent limitations. For example, unsupervised learning algorithms may produce noisy results if the data elements are unrelated and have few or no common characteristics (i.e., no natural groups). Some algorithms work well with inputs that are relatively independent of one another but would be poor predictors otherwise.
Lack of Experience Organizations new to ML may not have examples of successful ML projects to learn from. Inexperienced practitioners can acquire confidence in their fledging capabilities by first applying simple ML models to achieve better outcomes from existing solutions. After these initial successes, algorithms to improve the outcomes of these models can be progressively implemented in stages. For instance, an ensemble learning approach can be used to improve on the first model. If successful, more advanced ML methods should then be considered. This progressive approach can also alleviate the initial skepticism often present in the adoption of new technology.
The Future of Audit
Machine learning technology holds great promise for internal audit practitioners. Its adoption enables audit functions to provide continuous assurance by enhancing their automated detection capabilities and achieving 100% coverage of risk areas — a potential game changer for the audit profession. The internal audit function of the future is likely to be a data-driven enterprise that augments its capabilities through automation and machine intelligence.