IRS AI Audit Selection: Innovation and Risk

Jan. 3, 2025 /Mpelembe Media/ —  The Internal Revenue Service (IRS) is using artificial intelligence (AI) to improve the efficiency and effectiveness of its audit selection process, particularly within the National Research Program (NRP). The NRP is used to develop estimates of the tax gap, which is the difference between taxes owed and taxes paid. The IRS is implementing AI models to select tax returns for audit in a way that is intended to be more targeted and efficient than traditional methods.

Here’s how AI is being used in audit selection:

Traditional Sampling: The traditional NRP sampling approach selected returns randomly based on characteristics like income level, self-employment status, and refundable tax credits claimed on Form 1040. Returns were sorted into groups and subgroups based on these characteristics.

AI-Driven Sampling: The redesigned process uses AI models to assign risk scores to returns, predicting the likelihood of noncompliance and the potential tax change.

    • Returns are initially categorised by high-level characteristics, like the traditional approach.
    • AI models then score returns based on predicted risk and potential tax change.
    • The returns are divided into two samples:

AI Sample 1 aims to identify emerging types of noncompliance, selecting returns randomly from subgroups based on the potential to provide new information, similar to the traditional process.

AI Sample 2 focuses on returns with a higher likelihood of a large tax change after an audit, aligning with the IRS’s strategic goals.

Goals of AI Implementation: According to the IRS, the redesigned sample should increase efficiency by reducing operational costs of measuring noncompliance, making better use of historical compliance data, and increasing the frequency of updates to the models and estimates. The use of AI is also intended to make better use of all available compliance data, and increase the speed of learning and improvement of risk models.

Documentation: The IRS has multiple documents detailing the AI models, but they are not yet complete. For example, documents do not specify how data should be divided before being run through the models. Additionally, documentation is lacking on how and when to update the AI models, and how to assess model performance. The lack of complete documentation poses risks to the consistent and appropriate implementation of the models, and makes it more difficult to retain knowledge if staff turnover occurs.

Evaluation: The IRS plans to evaluate the new AI sample selection process by monitoring metrics like the no-change rate of audits, average tax adjustment by sample group, and audit case outcomes by sample group. However, the IRS has not developed specific performance targets for these metrics.

Transparency and Accountability: The use of AI in audit selection introduces unique challenges in promoting accountability because the inputs and operations of the models are not visible to the user, so documentation is especially important. The IRS has established governance policies to create trust in the use of AI through responsible practices, including a formal AI governance process and an AI use case inventory.

Minimum Practices: The IRS must also ensure that the use of AI follows specific minimum practices, especially for systems considered ‘safety-impacting’ or ‘rights-impacting’. The IRS’s AI governance policies are designed to incorporate these minimum practices, including AI impact assessments, real-world performance testing, and ongoing monitoring.

Redesign Goals: The redesign of the NRP includes goals such as reducing the number of NRP cases each year by improving sample selection, automating some administrative processes, and leveraging historical data to train AI models.

The IRS is working to balance AI innovation with risk mitigation by establishing a robust governance framework, maintaining an inventory of AI use cases, and implementing minimum practices for the use of AI. Despite these efforts, the IRS faces challenges in fully documenting and evaluating its AI models.