Historical Search Jobs Retrieve Message Trace Data More than 10 Days Old

An apparently simple conversation posted in a Facebook group asked about how to create an email traffic report detailing inbound messages received by a Microsoft 365 tenant over the last 90 days. The report should include information like a timestamp, email address of the sender, and if the message had an attachment. As it turns out, no simple solution exists for this request. Let’s explore why.

Exchange Online only keeps message trace data online for ten days. It’s possible to retrieve message trace data for up to 90 days, but only by running a historical search through the Exchange admin center or PowerShell.

A historical search means that Exchange Online runs a background job to retrieve the data from its message trace repository. This process can take anything from ten minutes to several hours, depending on the current load on the service. A historical search can cover message data for up to 100 email addresses and return a maximum of 100,000 records. Usually, people search for messages sent from or received by mailboxes (user or shared). This article explains how to run a historical search for email sent from shared mailboxes.

Creating a report for an entire organization for the last 90 days likely means that processing must be divided over several jobs to ensure that the number of addresses submitted for each job is 100 or less and that the job returns less than 100,000 records. An organization can run up to 250 historical searches daily.

Creating Historical Search Jobs

Any solution depends on running enough historical search jobs to retrieve message trace data for all SMTP addresses within scope of the search. If you want to create a report over the last 90 days for all inbound email, you need to find all the recipient addresses that external people might use and divide the addresses into batches of 100 that are then submitted for processing.

For instance, to check all mailboxes, the first step is to find the mailboxes. If more than 100 exist, you then separate them into sets of 100 or less. For each set, extract their primary SMTP addresses and store them in an array. As an example, these commands find user and shared mailboxes and store the primary SMTP address for each mailbox in an array:

[array]$Mailboxes = Get-ExoMailbox -RecipientTypeDetails UserMailbox, SharedMailbox -ResultSize Unlimited
[array]$RecipientAddresses = $Mailboxes.PrimarySMTPAddress

Exchange Online supports multiple proxy addresses for mail-enabled objects, which can receive email using any SMTP proxy address. If you want to check inbound messages for every possible address, you must extract the set of proxy addresses for each object and store all the addresses in the array. Something like this code extracts all the SMTP proxy addresses for the set of mailboxes (found using the code above) and stores them in an array:

[array]$MailboxProxyAddresses = $Mailboxes.EmailAddresses
[array]$MailboxAddresses = $Null
ForEach ($Address in $MailboxProxyAddresses) {
   If (($Address.Split(':')[0]) -in 'smtp', 'SMTP') {
      $SMTPAddress = $Address.SubString(5,$Address.Length-5)
      $MailboxAddresses += $SMTPAddress
   }
}

Proxy addresses include the MOERA (Microsoft Online Email Routing Address) that each recipient gets for the tenant service domain and any plus addresses assigned by administrators to mailboxes. It’s reasonable to expect that the number of proxy addresses will be between two and three times the number of primary SMTP addresses. Searching for all SMTP proxy addresses rather than primary SMTP addresses increases the number of historical search jobs.

Now you know what addresses to search for, you can submit the historical search jobs to retrieve data for the addresses. This code submits a historical search job to find inbound email for all addresses in the $RecipientAddresses variable (an array) for the last 90 days.

[int]$i = 1
$StartDate = (Get-Date).AddDays(-90)
$ReportName = ("Historical Search from {0} Number {1} Submitted {2}" -f $StartDate, $i, (Get-Date -format g))

$Status = Start-HistoricalSearch -RecipientAddress $RecipientAddresses -StartDate $StartDate -EndDate (Get-Date) -ReportType MessageTrace -ReportTitle $ReportName -Direction Received -NotifyAddress Admin@office365itpros.com

You can track the progress of the job with the Get-HistoricalSearch cmdlet:

Get-HistoricalSearch -JobId $Status.JobId | Format-Table JobId, Status, ReportTitle

JobId                                Status     ReportTitle
-----                                ------     -----------
3b9847c0-b1c2-4603-b344-3095b2d6c044 NotStarted Historical Search from 28/07/2023 21:44:51 Number 1 Submitted 26/10/20…

If you add a notification address when submitting a job, Exchange Online sends email to that address when the job finishes. Obviously, you must break up the set of searchable addresses into batches of 100 or less and submit a historical search job for each batch.

Downloading the Data for the Email Traffic Report

Eventually, all the historical search jobs will finish and the message trace data extracted by the jobs will be ready. Before you can use the data, you must download it from the Message Trace section of the Exchange admin center. Under the Downloadable reports tab, you’ll find a listing of the historical search jobs and can check details of each (Figure 1). When the job status is Complete, an option appears to download the report. It can take some time to connect to Azure to fetch the data, which downloads as a CSV file in Unicode format.

Details of a historical search job in the Exchange admin center
Figure 1: Details of a historical search job in the Exchange admin center

The notification message sent upon the completion of a job also includes a link to download the data file (Figure 2).

Notification email received after a historical search job finishes
Figure 2: Notification email received after a historical search job finishes

Creating an Email Traffic Report from the Historical Message Trace Data

If you’ve had to split processing over multiple jobs, you must download the file for each job. To make it more convenient to process the files, I moved them to a specific folder. The task is then to write a PowerShell script to loop through the files, extract the message trace data from each file, and combine the data into a single set for analysis.

The script I wrote to process the message trace files is available from GitHub. After processing is complete, a PowerShell list object (called $Report) containing the data extracted from the historical trace files is available for analysis. The original request was to create a report listing the timestamp, sender, and whether messages have attachments. Message trace information doesn’t include indications when emails have attachments. It might be possible to assume that any message with a byte size over 100,000 has an attachment, but given the size of embedded graphics that can be in email, that’s a big assumption.

Apart from attachments, the script can generate a report containing the requested information. Figure 3 shows the information piped through the Out-GridView cmdlet as an example of the script output.

Email Traffic report generated from historical message data
Figure 3: Email Traffic report generated from historical message data

The point is that once the script generates the data, it can be sliced and diced into whatever what you want using whatever tool you think is best. Some would import the data into Power BI to use its visualization capabilities. Others will be happy with simple PowerShell commands to create different statistics. For example, these commands group the sender email addresses and sender domains from the file to report the most common senders and sender domains:

$Report | Group-Object Sender -NoElement | Sort-Object Count -Descending | Select-Object -First 10 | Format-Table Name, count

$Report | Group-Object Sender_Domain -NoElement | Sort-Object Count -Descending | Select-Object -First 10 | Format-Table Name, count

Name                      Count
----                      -----
gmail.com                   730
microsoft.com               620
yandex.com                  508
practical365.com            272
linkedin.com                234
yahoo.com                   224
yammer.com                  205
lists.irishtimes.com        182
quest.com                   174
email.teams.microsoft.com   147

Leveraging the Power of PowerShell

What’s been proven in this journey is that despite Microsoft restrictions, it’s possible to retrieve and analyze large amounts of historical message trace data. All it takes is planning. After that, PowerShell will submit the historical message trace jobs and process the information found by those jobs. What you do with the results is up to you.

To emphasize how useful PowerShell is when dealing with message trace data, here are some other articles to read.

TEC Talk: 5 Steps to Embark on Your Zero Trust Journey with Microsoft 365

Join Microsoft’s Jeff Bley and Adwoa Boateng-Kwakye’s FREE Webinar on Nov. 30th @ 11 AM EST

About the Author

Tony Redmond

Tony Redmond has written thousands of articles about Microsoft technology since 1996. He is the lead author for the Office 365 for IT Pros eBook, the only book covering Office 365 that is updated monthly to keep pace with change in the cloud. Apart from contributing to Practical365.com, Tony also writes at Office365itpros.com to support the development of the eBook. He has been a Microsoft MVP since 2004.

Leave a Reply