Robust audit risk assessments — a key building block of high-impact audits — are, by nature, a challenge for any internal audit department, and even more so in today’s dynamic audit environment. Especially in public sector organizations, where limited resources, competing priorities, and lack of subject matter expertise impede risk identification, auditors are increasingly looking to technology for solutions. Specifically, internal auditors can augment their risk management activities by using automated solutions that assess the literature in the field of interest to predict industry trends.
Cognitive technology — intelligent computer systems designed to perform human tasks — has long been used to enhance research and knowledge collection. This technology has potential to transform the internal audit profession, particularly in performing risk assessments.
Auditors at the New York State Office of the State Comptroller (OSC) have developed a tool set that leverages cognitive technology to extract and analyze text from audit reports, creating a search vehicle capable of identifying meaningful data within documents that collectively can help auditors identify high-risk areas. The tool set enables auditors to immediately access a wealth of publicly available, but until recently, elusive audit-critical information, minimizing time-consuming manual processes to identify themes and risks and ultimately improving the effectiveness of risk management, control, and governance processes within the agencies and organizations the OSC audits.
Distilling the Facts
Audit reports represent a source of untapped data. It is difficult to extract value from data using time-consuming manual searches. Conversely, computers have unlimited capacity for extracting value efficiently, as long as document text is prepared in a uniform format that both humans and machines can understand. Applying natural language processing (NLP) to text can allow internal auditors to tap into each sentence of every report, generating mountains of new information. NLP is a field of artificial intelligence that enables computers to understand human language. For example, NLP enables the iPhone’s Siri personal assistant to answer users’ questions. NLP can transform audit reports into a powerful source of insights for more targeted risk assessments and audits.
Searching for relevant audits requires varying amounts of information to be communicated through a web browser. Audit reports are available on the web in a range of formats — from the simple PDF to the more sophisticated HTML — each with varying levels of interoperability, depending on how the back-end information is organized. Data can be “structured” text containing additional coded information that facilitates machine reading, or “unstructured” text that lacks the required detail to enable efficient machine reading. The more structured the documents are, the more relevant the document retrieval can be.
The OSC’s tool set creates a process to derive machine-readable data from audit reports by: 1) converting text to a standardized structure, 2) adding layers of meaning to the text, and 3) teaching computers to use the information to recognize and understand common audit language, concepts, and themes, as well as to analyze associations. Although the OSC’s work to date has involved performance audit reports only, the tool set can be applied to any report type.
The OSC’s tool set uses optical character recognition engines to extract plain text only from each document. We then apply NLP to the plain text. NLP creates additional layers of linguistic information, which allows computers to put words into context and derive meaning.
NLP uses grammar rules to identify and classify parts of speech, and codes them using annotation tags. Likewise, it locates proper nouns, and classifies and tags them according to predefined named entity categories. For example, take the sentence, “For the two fiscal years ended June 30, 2010, the Mill Neck School claimed approximately $16.7 million in reimbursable expenses.” NLP identifies and tags “Mill,” “Neck,” and “School” each as a proper noun singular and then, based on their proximity, classifies and tags the proper nouns collectively as the named entity “organization.”
Based on the NLP annotations, additional information extraction techniques detect and tag audit-specific elements such as “auditee” and “finding.”
New information derived from NLP annotations allows auditors to data mine every sentence within a collection of documents using a variety of pre-set text recognition “rules” to identify high-relevance themes and risks. These rules, which interact with the computer in the form of user queries, act as filters to guide the computer’s recognition of text. Rules can vary in complexity, depending on the type of information the user seeks. For example, users can filter documents based on the frequency of a certain word or word combination occurring within them (visually represented as a word cloud) or on a cluster of specific words that are commonly associated with a given audit concept such as a finding.
Using the criteria of a given rule, the computer can search a database of annotated documents and identify text that fits the rule. For example, if auditors are interested in identifying areas within the Medicaid program at risk for overpayment, they can query the database for the word combination “overpayment–Medicaid.” The tool set then analyzes all the reports in the database, identifies those that contain the “overpayment–Medicaid” word combination, and ranks them by frequency of word combination occurrence.
After auditors select the reports that are of interest, the computer can automatically extract audit concept information from each. For example, certain words such as “ensure,” “need,” “reveal,” and “discover” are frequently used in reports’ findings sections. The computer searches for these words and extracts sections from the reports that contain them. Information can be retrieved in source list or text display views. As the computer’s knowledge bank grows — by learning new queries, understanding them in the context of existing queries, and thus creating new knowledge — the technology will become increasingly intuitive of the user’s intent.
Risk Assessment Transformed
Applied to the OSC’s growing database of audit reports, the tool set has transformed the office’s risk assessments by:
- Unlocking new insights from raw information in existing work, which expands the scope of risk assessment.Speeding data collection.
- Enabling auditors to assess the quality of data faster and determine which are most useful.
- Allowing auditors to leverage real-time data to continuously monitor trends and more quickly identify new risks.
As a result, the OSC’s auditors are better equipped to identify threats to a program’s or an organization’s success and sustainability, conduct more productive audits, make meaningful recommendations, and ultimately deliver on their professional commitment to improve governance, operations, risk management, and control processes.
Adapting to Changing Risk
The audit environment of today is highly dynamic: Risks are increasing in number and complexity, as are the number of regulations being created to control them. The OSC’s tool set is a critical resource to help auditors adapt to these changes, while supporting the profession’s advocacy of good governance. It’s an example of how internal auditors globally could leverage the benefits of technologies such as artificial intelligence to address risk in real time.