Retrieve Sensitivity Label Information for SharePoint Online Documents

In July 2022, I wrote about using Graph APIs to create a report of files stored in a SharePoint Online site. It was a fun project, and I learned a lot about navigating SharePoint Online with Graph requests using concepts like drives and drive items. I use the script to generate reports frequently, so I gained something from the exercise.

One thing I couldn’t report was whether the reported files had sensitivity labels. I’ve created scripts in the past to retrieve files with sensitivity labels (this example offers the chance to decrypt files with sensitivity labels using the Unlock-SPOSensitivityLabelEncryptedFile cmdlet) where a request like the one shown below retrieves the necessary information:

$Uri = "https://graph.microsoft.com/v1.0/sites/$($Siteid)/lists/Documents/Drive/Items/$($DriveId)/children?`$select=sensitivitylabel,weburl,name"} 
[array]$Files = (Invoke-RestMethod -Uri $URI -Headers $Headers -Method Get -ContentType "application/json")

I could have continued along that path to retrieve the sensitivity label information, but I decided to use the new extractSensitivityLabels API. The big difference with this API is that when you use it, the API updates “the metadata of a drive item with the latest details of the assigned label.” As we’ll see, this aspect of the API can have an unexpected side effect.

In addition, the API can handle the extraction of “one or more sensitivity labels assigned to a drive item.” I didn’t know that SharePoint Online documents can have more than one sensitivity label, but apparently, they can if a document passes from one tenant to another and accrues labels from each tenant. However, once a user applies a label with encryption, the document can store just that label.

Quest’s Microsoft Platform Migration Planning and Consolidation

Changes Made to Retrieve Sensitivity Label Data

To test the API, I added some code to the UnpackFilesRecursively function in the original script (available from GitHub). If you want to try the script out, download the script and insert the following code to replace the original UnpackFilesRecursively function:

Function UnpackFilesRecursively {
# Unpack set of items (files and folders)
param (
        [parameter(Mandatory = $true)]
        $Items, # Items to unpack
	
		[parameter(Mandatory = $true)]
        $SiteUri, # Base site URI
		
		[parameter(Mandatory = $true)]
        $FolderPath, # Folder path
		
        [parameter(Mandatory = $true)]
        $SiteFiles,
		
		[parameter(Mandatory = $false)]
		[bool]$IsNextLink
    )

  # Sensitivity label document types
  [array]$ValidDocumentTypes = "docx", "pptx", "xlsx", "pdf"
  # Find sub-folders that we need to check for files
  [array]$Folders = $Items.Value | Where-Object {$_.Folder.ChildCount -gt 0 }
  # And any files in the folder
  [array]$Files = $Items.Value | Where-Object {$_.Folder.ChildCount -eq $Null}
  
  $before = $SiteFiles.count
  
  # Report the files
  ForEach ($D in $Files) {
    $LabelName = $Null
    $FileSize = FormatFileSize $D.Size
    # Check Sensitivity label
    $Type = $D.Name.Split(".")[1]
    If ($Type -in $ValidDocumentTypes) { 
     # Write-Host "Processing filename:" $D.Name
      $Uri = ("https://graph.microsoft.com/beta/sites/{0}/drive/items/{1}/extractSensitivityLabels" -f $Site.Id, $D.id)
      Try {
         $LabelsInfo = Invoke-MgGraphRequest -Uri $Uri -Method POST }
      Catch {
         Write-Host ("Failure reading data from file {0}" -f $D.Name) 
         $LabelsInfo = $Null
      }
      # Resolve sensitivity label identifier if one is found to find label name
      If ($LabelsInfo.labels.sensitivityLabelId) { 
      #  Write-Host "Label Id" $LabelsInfo.labels.sensitivityLabelId 
        $LabelName = $LabelsHash[$LabelsInfo.labels.sensitivityLabelId ]
     } # End if Label data
    } # End if type
    $ReportLine  = [PSCustomObject] @{   
        FileName            = $D.Name
        Folder              = $FolderPath
        Author              = $D.createdby.user.displayname
        Created             = $D.createdDateTime
        Modified            = $D.lastModifiedDateTime
        Size                = $FileSize
        'Sensitivity Label' = $LabelName
        Uri                 = $D.WebUrl 
        Id                  = $D.Id}
     $SiteFiles.Add($ReportLine) 
  } # End If

  $NextLink = $Items."@odata.nextLink"
  $Uri = $Items."@odata.nextLink"
  While ($NextLink) { 
    $MoreData = Invoke-MgGraphRequest -Uri $Uri -Method Get
    UnpackFilesRecursively -Items $MoreData -SiteUri $SiteUri -FolderPath $FolderPath -SiteFiles $SiteFiles -IsNextLink $true
  
    $NextLink = $MoreData."@odata.nextLink"
    $Uri = $MoreData."@odata.nextLink" 
  } # End While
  
  $count = $SiteFiles.count - $before
  if (-Not $IsNextLink) {
    Write-Host "  $FolderPath ($count)"
  }
  
  # Report the files in each sub-folder
  ForEach ($Folder in $Folders) {
	$NewFolderPath = $FolderPath + "/" + $Folder.Name
	$Uri = $SiteUri + "/" + $Folder.parentReference.path + "/" + $Folder.Name + ":/children"
	$SubFolderData = Invoke-MgGraphRequest -Uri $Uri -Method Get
    UnpackFilesRecursively -Items $SubFolderData -SiteUri $SiteUri -FolderPath $NewFolderPath -SiteFiles $SiteFiles -IsNextLink $IsNextLink
  } # End Foreach Folders
}

You can see that the script declares the set of file extensions, and it will check for sensitivity labels. I’ve added the extensions for Word, PowerPoint, Excel, and PDF to the array. Over time, as Microsoft Information Protection supports additional file types, the extensions for those files can be added.

Interpreting Sensitivity Label Data

The information returned for a file looks like this:

Name                           Value
----                           -----
tenantId                       a662313f-14fc-43a2-9a7a-d2e27f4f3478
assignmentMethod               standard
sensitivityLabelId             1b070e6f-4b3c-4534-95c4-08335a5ca610

Not everyone speaks fluent GUID, so to interpret the sensitivity label identifier to label name, we create a hash table to hold label identifiers and names that the script can look up. This code must be run at the start of the script:

Connect-IPPSSession
[Array]$LabelData = Get-Label | Select-Object ImmutableId, DisplayName
$Global:LabelsHash = @{}
ForEach ($L in $LabelData) {$LabelsHash.Add([string]$L.ImmutableId,[string]$L.DisplayName) }

Everything works very nicely (Figure 1) with the caveat that the script now makes an additional Graph request for every document with a supported file type. This will inevitably slow processing down. I’ve run the script against document libraries holding thousands of files, and the performance wasn’t unacceptable. It can take a second or so to fetch the label information for a very large document (for example, a 1,400-page 38 MB Word document), but smaller files don’t cause a large delay.

Reporting Sensitivity Label information for files in a SharePoint Online document library
Figure 1: Reporting Sensitivity Label information for files in a SharePoint Online document library

Document Mismatches

As noted above, there was an unexpected side effect. Because the API updates documents with the latest label metadata, running the script against a large document library caused ten document mismatch notifications to arrive in quick succession. The explanation was simple: every sensitivity label has a priority order. If you put a document with a high-priority label into a site assigned a lower-priority (container management) label, a mismatch occurs. In my case, the priority for sensitivity labels assigned to documents changed since their original assignment, and the documents now had a higher-priority label than the site.

Data Governance Reports

One of the benefits of the Microsoft Syntex-SharePoint Advanced Management license is that administrators can access data governance reports through the SharePoint Online admin center (Figure 2).

Sensitivity label reports available for data governance
Figure 2: Sensitivity label reports available for data governance

The data governance reports are nothing special and are certainly not a good reason to buy the advanced management license. You can do a much better job yourself with PowerShell, including output to (with the PSWriteHTML module) or Excel (with the ImportExcel module). If you’re interested in the Microsoft Syntex-SharePoint Advanced Management license, focus on features like blocking downloads for Teams Meeting recordings. You’ll be happier.

Next Stop, Assigning Sensitivity Labels with a Graph API

Along with an API to retrieve sensitivity label information for SharePoint Online documents, the assignSensitivityLabel Graph API is available to assign sensitivity labels to documents. The problem is that this API is metered and protected. Metered means that you pay to use it through an Azure subscription. Protected means that Microsoft must give its consent for an app to use the API. I’ve applied for permission and prepared Azure to accept charges. Once Microsoft gives the OK, I’ll report back on using the API to assign sensitivity labels.

About the Author

Tony Redmond

Tony Redmond has written thousands of articles about Microsoft technology since 1996. He is the lead author for the Office 365 for IT Pros eBook, the only book covering Office 365 that is updated monthly to keep pace with change in the cloud. Apart from contributing to Practical365.com, Tony also writes at Office365itpros.com to support the development of the eBook. He has been a Microsoft MVP since 2004.

Comments

  1. Dan Cecil

    Hi Tony, great article! I’ve noticed that there is a SharePoint property on the ListItem named ‘_IpLabelId’ that seems to contain the Sensitivity Label Id.

    Do you know anything about this, and how robust it may be?

    Am I also right in thinking there can be multiple sensitivity labels applied to a file, for example from different tenants? My understanding is the extractSensitivityLabel command produces an array of objects.

  2. Christopher Iguardia

    Hi Tony, I have a challenge here. Is there a way to create a report where I can see the Sensitivity label (internal, public, confidential, etc.) of the files saved in a folder?

    Can you help me and provide guidance on this?

    1. Avatar photo
      Tony Redmond

      Did you look at the script? That’s exactly what it does…

Leave a Reply