Using Azure Information Protection Scanner to Classify & Protect Your Data

Company data breaches are becoming more common every day.  Social engineering is an age-old practice that malicious hackers use to exploit the weakest link in an organization:  human psychology.  Social engineering hacks are an organizations biggest fear and for good reason.  Using Azure Information Protection (AIP) organizations can employ a set of classification and protection standards for all of their data no matter where it lives.  AIP can help organizations implement the necessary security and classification policies which are forever tied to the data.  No matter where the data lives or who gets their hands on it, companies can be assured that their data is safe and compliant.

Azure AIP How.jpg

Azure Information Protection labeling

One of the biggest challenges with becoming data compliant is implementing a data classification and protection process that considers each individual piece of data’s sensitivity, storage and distribution needs.  With the amount of data exponentially growing every day this task feels like a huge uphill battle.  Discover, classify, label and protect your data with Azure Information Protection.  AIP is a cloud-based solution that can be used to classify, label and protect documents and emails.  AIP protects all file types whether it’s at rest, in use, or in-motion.  AIP has tight integration with Office files and PDFs and provides the best end-user experience.  AIP can also protect emails and data stored inside and outside of the Microsoft Cloud and with non-Microsoft cloud and SaaS apps.  Using labels AIP can be configured for two types of policies: protection and retention.  Labels are used to classify and protect your data across workloads no matter where the files are stored.  Additionally, retention policies can be configured for each label that is created to meet your organizations data compliance requirements. 

Azure Information Protection can help companies in their journey to GDPR compliance by discovering sensitive data within the organization.  AIP has the capability to automatically scan, classify and protect sensitive files that are discovered throughout the scanning process.  New files that are currently being used can be manually labeled but what about the large number of files that are sitting in SharePoint or on an on-premises file share?  This is where the AIP scanner tool comes to the rescue.  The AIP scanner is a tool that can be used to discover, label and protect many files at once automatically.

Let’s say for example you have a large file share in your on-premises environment.  This file share includes a plethora of different files and file types.  You can configure the AIP scanner to begin scanning this file share and labeling your data according to content that matches a pre-defined condition.  The AIP scanner will label and protect your files in an automated fashion.  Labels apply classification, and optionally, apply or remove protection.  These labels can be used by the AIP scanner tool to automatically classify sensitive data.  For example, if a file contains credit card numbers or employee social security numbers the scanner will recognize the sensitive data and apply protection and optionally a retention policy according to the AIP label that gets applied.  Sensitive information updates are being added all the time.  Just this week Microsoft added some additional types to help address classification needs around GDPR as you can read here:  New GDPR sensitive information types help you manage and protect personal data

AIP labels are a classification capability provided by the AIP service which can be used to identify and classify the different types of data that exists within your organization.  These labels are what the AIP scanner will use when applying them to discovered files.  Labels can be categorized by sensitivity levels that range from non-business to highly confidential.  Labels will define what type of protection and retention get applied to your files.

The labels can include visual markings such as a header, footer, or watermark. Metadata is added to files and email headers in clear text. The clear text ensures that other services, such as data loss prevention solutions, can identify the classification and take appropriate action.  This is great for companies that are moving to the cloud and want to make sure their data is classified and protected before the migration from on-premises.

AIP Scanner Overview

The AIP scanner runs as a service on a Windows Server that is a part of your on-premises network.  The scanner currently supports the following data stores for discovery:

  • Local folders on the Windows Server computer that runs the AIP scanner

  • UNC paths for network shares that use the Server Message Block (SMB) protocol

  • Sites and libraries for SharePoint Server 2016 and SharePoint Server 2013

  • Cloud repositories that use Cloud App Security

The AIP scanner is locally configured on the Windows member server and maintains a secure connection to your O365 tenancy.  The tool is constantly monitoring the automatic labeling requirements which are setup in the Azure Information Protection policies within the Azure portal.  The AIP scanner can inspect any file that Windows can index by using iFilters to open the different file types. 

The scanner uses the Office 365 built-in data loss prevention sensitivity information types and pattern detection.  This provides AIP with the ability to recognize the data inside of the file, label and protect it automatically.  Additionally, the AIP scanner can be run in discovery mode.  In this mode reports are created to provide a picture of the potential labeling changes that would be made to your data without actually applying the labels.  This mode is especially useful when you want to see the potential impact of applying different labels across your data.  The scanner will systematically crawl the data stores that have been configured.  The scanner can be configured to run on a schedule so as new files get added to the data stores these files will be labeled and protected automatically.  For the first scan cycle of the datastore the scanner will perform a full crawl of each file.  Subsequent scan cycles will only include new and modified files.

The following file types can be automatically labelled according to the pre-defined conditions:

  • Word: docx, docm, dotm, dotx

  • Excel: xls, xlt, xlsx, xltx, xltm, xlsm, xlsb

  • PowerPoint: ppt, pps, pot, pptx, ppsx, pptm, ppsm, potx, potm

  • Project: mmp, mpt

  • PDF: pdf

There are many use cases where labeling all your files automatically is not the best approach.  Applying labels haphazardly will only cause confusion to end users who will undoubtedly complain and lose faith in AIP.  It is a best practice to use the AIP “recommendation classification” option to start when configuring your AIP labels.  This classification option will allow the user to accept or reject the recommended classification and protection from AIP.  Labels which are configured to be applied when certain conditions are met will trigger the AIP client to recommend a label to the user as shown below:

AIP Office Toolbar.png

If the user decides to dismiss this recommendation they will be prompted for justification of the change as shown here:

AIP Office Classification.png

Recommended classification applies to Word, Excel, and PowerPoint.  When the file is saved the user will be prompted as shown in the above screenshots.  Unfortunately, the recommendation classification feature does not currently work with Outlook.

AIP Scanner Licensing

The AIP scanner is an Azure Information Protection P2/EMS E5 feature.  The AIP P2/EMS E5 license is required to enable automatic labeling using custom labels that are pre-defined.  This license enables the use of custom labels that can be applied automatically by the AIP scanner.  This includes creating pre-defined conditions for sensitive data that will trigger AIP to apply a label and optionally protection.  With that said, currently AIP P1/EMS E3 licenses can use the AIP scanner tool which will only allow the use of one default policy.  The good news for organizations with a E3/AIP P1 license is that they can set the default label for each specific datastore (think folder in a file share or document library in SharePoint) to automatically classify their files.  Going back to the file share example above, let’s say there was an HR and Legal folder in the file share.  You can configure the AIP scanner to use a different default label (one for HR and another for Legal).  Yes, this process is going to be more manual then if you had a P2 license but it’s not a bad workaround if you ask me!

Hosting the AIP scanner configuration requires the use of a SQL Server instance.  Here are some of the key points when planning the SQL Server  

  • The AIP scanner installation requires a SQL Server instance to store the scanner configuration

  • SQL Server 2012 is the minimum supported version

  • The AIP scanner supports the use of a SQL Server Express license

There are two levels of AIP Premium licensing.  P1 & P2, the biggest difference between them is that P2 includes the automated and recommended data classification capabilities.  Here is a link to the official breakdown of each of the different AIP pricing plans:  here.  The AIP scanner can be downloaded and installed as a part of the Microsoft AIP client download found here.  Make sure that you download the full client in order to be able to install the AIP scanner.  Once the AIP client is downloaded and installed, the AIP scanner can be configured using PowerShell.


Azure Information Protection provides a cloud-based solution for classifying, labeling and protecting your data.  Organizations can leverage this solution to apply a consistent classification and protection policy to files throughout the lifecycle of their data.  The AIP scanner adds additional value by allowing organizations with large amounts of data to automate the labeling and protection of their data.  For organizations looking to not only classify their data but also protect it no matter where the content is stored or how it is moved, I would highly recommend looking into the Azure Information Protection product.      


Leverage Azure Information Protection