AI Classification for Google Drive Security

Nov. 2, 2025 /Mpelembe Media/ — The white paper from Google outlines their AI Classification for Google Drive solution, which uses custom, privacy-preserving AI models to automatically identify and label an organisation’s sensitive data at scale. The document explains that this capability addresses the challenge of manually classifying the exponentially increasing volume of data in Google Workspace, providing a more robust foundation for data protection, auditing, and records management.
The process involves designated users manually applying training labels to files, which the AI model learns from to build a customised classification system. The text details the three phases of implementation, from preparing the training data to automatically applying labels, and addresses how to improve model performance by providing diverse and balanced data. Finally, the paper discusses how the system maintains model compliance and adheres to user actions when files are automatically or manually labeled.

Google’s AI classification approach is designed to overcome the persistent challenge of identifying and classifying data at scale with accuracy, thereby enabling the desired granular data management and protection.

This strategy centres on providing customers with the ability to tailor data protection and governance policies to their specific security preferences and requirements using comprehensive data classification capabilities.

Core Mechanism and Granularity

The AI classification approach uses advancements in artificial intelligence to address the need for classifying data quickly and precisely across the exponentially increasing amount of data being created, shared, and replicated today.

  1. Custom AI Models: AI classification enables customers to develop completely customized AI models. These models are trained exclusively on their data to automatically identify, classify, and apply bespoke labels to files.
  2. Classification Labels as Metadata: The core tool is the classification label, which provides an adaptable framework for identifying and categorizing information. When applied to files in Google Drive, these labels are recorded as metadata.
  3. Enabling Granular Control: This metadata subsequently becomes accessible by various Workspace systems, allowing the labels to act as triggers for specific policies. Granularity is achieved across several areas:
    • Data Protection: Classification labels serve as rule conditions to trigger policy enforcement via Workspace Data Loss Prevention (DLP) capabilities. This allows administrators to control the sharing of sensitive information by applying restrictions on operations like external sharing, downloading, copying, and printing.
    • Auditing: Drive log events are enriched with classification label metadata, allowing administrators to monitor activities (e.g., file access, edits, and sharing of sensitive files) by filtering based on specific label conditions.
    • Records Management: Google Vault supports custom retention rules for Drive based on classification labels, allowing administrators to set file-level policies using configurable label conditions.
    • Search & Discoverability: Classification labels are accessible as an Advanced Search parameter in Drive, allowing end users to search for files based on label values.

The AI Classification Process

The AI classification approach for Google Drive involves a structured three-phase process to ensure the model is trained effectively and automatically applies labels with a high degree of precision and scale:

Phase 1: Prepare for Training

  1. Label Creation: A classification label is created in the Label Manager (the label the AI model will apply), along with a separate training label that mirrors it. The training label is used exclusively for training purposes.
  2. Designated Labelers: Users with a strong understanding of data loss prevention policies are identified as “designated labelers” to evaluate sensitive files and participate in the setup and ongoing model development.

Phase 2: Train the Model

  1. Marking Files: Designated labelers classify Drive files using the training label (“Files marked for training”).
  2. Model Development: Using only the files labeled for training, AI classification builds a model that learns to recognize patterns and characteristics specific to the organization’s data.
    • To ensure adequate representation for the machine-learning model, a minimum of one hundred examples per class (file label) is required.
  3. Model Evaluation: Following training, 25% of the training data is automatically withheld and used as a “hold-out” set to test the model’s performance and establish a baseline for how well it generalizes its knowledge to unseen data. The model’s score reflects its recall.

Phase 3: Turn on Automatic Classification

  1. Auto-apply: If the model attains an acceptable score (e.g., High > 80%), the administrator can enable “auto-apply”.
  2. Granular Scoping: Auto-apply can be scoped to specific audiences within the organization, providing granular control over whose files are evaluated for labeling.
  3. Continuous Evaluation: Once enabled, the model attempts to process all files owned by licensed users and continuously evaluates newly created and edited files against the model’s criteria.
  4. User Review and Feedback: When a label is automatically applied, end users are prompted to review, accept, or modify the selection. If designated labelers identify inaccurate applications, they can apply the correct training label, incorporating it into the updated model-training data set in the next iteration.

Commitment to Precision and Privacy

To maintain precision and align with data security practices, the approach includes specific rules for label modification and strong privacy assurances.

  • User Authority: AI classification generally defers to the judgment of file owners and editors who have applied labels and will not modify those values. If an end user accepts or modifies an AI-applied label, the model will not change that verdict in the future, even if the file is modified.
  • DLP Override: In cases where specific sensitive data is present, Data Loss Prevention (DLP) rules can be configured to override user-applied label values.
  • Bespoke and Private Models: AI classification aligns with Google’s AI Principles and compliance standards. Google Workspace does not use customer data, prompts, or generated responses to train or improve underlying cross-customer AI models. Each AI classification model is unique to the customer and only ever used within their domain.

Analogy: Think of Google’s AI classification approach as a custom-built digital librarian service for your organization. Instead of having staff manually read and tag every document (which is slow and hard to scale), you train a specialist librarian (the custom AI model) using examples of your unique document types (the training files). Once trained, this specialist automatically tags millions of files (applies labels) accurately, allowing the organization’s security system (DLP, Auditing) to instantly know which shelves (granularity) the file belongs on and exactly how to protect it.

Download the white paper here