The vast amount of data generated by business and the increase in data warehouses and legacy systems have created a treasure trove of information to be mined to draw meaningful insights regarding fraud indicators, emerging risks, and business performance. Companies such as Amazon, Facebook, Google, and Netflix are built on foundations of data exploration and mining.
Data mining, which includes text mining, is the discovery of information without a previously formulated hypothesis where relationships, patterns, and trends hidden in large data sets are uncovered. It involves using methods at the convergence of artificial intelligence, machine learning, statistics, and database systems. With the advent of big data, this niche-driven research discipline, developed in the 1980s, is now a powerful tool.
There are no roadmaps or directions in data mining. Instead, it requires thinking outside the box to come up with a range of scenarios. Questions like, “What are the risks?” “What opportunities exist for business improvements?” “How can this data be leveraged?” and “What fraudulent activities can occur?” can lead to developing algorithms.
Data Mining Techniques
|Examples of Data Mining|
Data mining can detect a range of fraud indicators such as bogus vendors, kickbacks, money laundering, insider trading, and claims fraud.
In a telecommunications audit, for example, a model can be built to show patterns of call destinations, duration, frequency, and time of day. Over time, when actual calls vary from expected patterns, it will alert internal audit to the possibility of fraud.
Outcomes also can indicate cost-saving opportunities, potential irregularities, and patterns worthy of further investigation. For example, in a procurement audit, using text mining that brings up common products and services may determine that there is an annual savings or discount to ordering cleaning supplies from one vendor instead of several vendors.
In a retail audit of a bank branch, a review of customer accounts can show single bank accounts converted to joint accounts, indicating marriage. Internal audit may recommend cross-selling mortgages and consumer loans to the joint account owners, which can grow branch profitability.
In a loan audit, nonperforming loans can be segmented to show different factors for loan failures. This can help guide the revamping of credit models and tightening of lending practices, which can reduce the number of nonperforming loans.
The most common techniques used in data mining are predictive modeling, data segmentation, neural networks, link analysis, and deviation detection.
Predictive modeling uses “if then” rules to build algorithms. For example, during a loan audit, auditors can create rules to show which customers in a specific age range (18-25, for instance) with balances exceeding US$5,000 are likely to default.
Data segmentation involves partitioning data into segments or clusters of similar records. Also called clustering, this technique lets auditors see common factors underlying each segment. For example, a marketing audit can look at residents of urban neighborhoods and affluent areas where wealthier, older people live.
Neural networks are a type of artificial intelligence that uses case-based reasoning and pattern recognition to simulate the way the brain processes, stores, or learns information. In fraud detection, neural networks can learn the characteristics of fraud schemes by comparing new data to stored data and detecting hidden patterns.
Link analysis establishes links between records or sets of records. Such links are called associations. Examples include customers buying one product at a specific time and then a different product a few hours later or a vendor supplying a raw material and purchasing a byproduct. Or, in the case of a money laundering audit, identifying addresses that have many wire transfers attached to them.
Deviation detection is pinpointing deviations from the observations or model worthy of further investigation. An example is detecting an unusual transaction on a credit or purchase card that does not fit the typical spending patterns of a cardholder, such as buying a refrigerator or booking a vacation on a company’s purchase card.
The rapid evolution of data mining techniques on unstructured or semi-structured textual data now provides opportunities for audit analysis. Mining this vast text field is a key tool in the internal auditor’s arsenal for fraud prevention and detection. Word searches using “kickback,” “bank account,” “funds,” “money,” and “override” could uncover fraud, while words such as “flowers,” “anniversary,” “chocolate,” “gift,” “bar,” and “drink” could indicate office romances that breach a company’s code.
Analysis of email logs can uncover key information about employees’ interests, activities, and behaviors. Email contents might include potential evidence of fraud and issues of audit concern. For instance, emails from an employee to customers when the employee does not hold a position that normally communicates with customers would be a red flag.
Emails might contain an exchange of information between parties that can provide evidence of a wide range of managerial fraud. Also embedded in email contents might be issues relating to breaches of compliance requirements and their cover ups, privacy matters, and theft of intellectual property. As emails pass through gateways, they are easy to archive, index, categorize, and monitor for keywords.
Social Network Analysis
Analysis of employees’ Facebook, Linkedin, and Twitter accounts explores relationships or networks between email senders and recipients. Social network relationships may presage kickbacks or collusion between employees and third parties. Within this context, social media analytics is a tremendous tool. However, consideration should be given to such key risks as security, privacy and confidentiality, loss/theft of intellectual property and trade secrets, and legal and compliance.
Data Mining Tools
Data mining can be performed with comparatively modest database systems and simple tools or off-the-shelf software packages. Microsoft Excel has a wide range of functions that can be used in data mining without the hours of training required for other programs. Generalized audit software and server database software also are formidable data mining tools.
Raising the Bar
Data mining demands considerable time, serious commitment, a new mind-set, and new skills. Delays in getting the data, uncooperative management, time spent understanding the data, and scrubbing it are additional challenges. Data mining raises the bar on what can be achieved by addressing issues beyond the reach of traditional analysis techniques. It is more than running complex queries on large data sets. Internal auditors must work with the data to have it reorganized and cleansed, and identify the format of the information based on the technique or analysis they want to use. Data mining increases audit coverage, and with the internet and computer-assisted audit tools, auditors should be limited only by their imaginations.