Power Query has more than 80 data sources supported, and Microsoft Exchange is one of those data sources. Although it is not a new data source, however, the power of this data source combined with the data transformation power of Power Query is not clear for everyone. Using this data source, you can get all emails in your inbox (or other folders), and then filter is based on sender or subject, and then expand attachments of emails into a table and visualize it at the end in Power BI. You can do all of that using Power Query in Power BI, you won’t need to do any manual step of saving attachments into a shared folder, and then get data from that folder. In this article, I’m going to show you how it is possible.
Get Data from Email
Mailbox is a data source that can be very beneficial. There are a lot of important data coming through emails. sometimes with attachments, sometimes without. A mailbox also can have other parts such as calendar, tasks, and people which can be very helpful for a data insight solution. In Power BI, or let’s say using Power Query, you can start getting data from your mailbox, using the Get Data option from Microsoft Exchange;
Then you need to enter your email address;
and after that, you will be asked to enter credentials. If you have Office 365 mailbox, then you can use Microsoft tab to login. After successful login, you will see a list of all entities that you can get data from that;
As you can see in the above screenshot, I can get Calendar items, Mails, Meeting requests, People and Tasks from a mailbox. The screenshot above shows an example of a Calendar entity. However, in this example, I’m focusing on Mails. So I choose that to get data from and then click Edit to filter my data before loading it into Power BI;
This will bring up the Power Query Editor with a preview of emails I have in my inbox;
This would be a table including all email properties as well as a couple of fields for HasAttachments and Attachments itself!
In this example, I’m focusing only on emails with specific attachments. You have to find the rules to filter the table based on it. For example, if you are going to filter based on the sender, then you can expand the Sender field (which is a record itself), and then filter based on the email of the sender;
Or if you want to filter based on the subject, then you can do the filtering based on that;
Filtering the Folder
You might also want to filter based on the folder in your email account (in case you have emails coming or going from multiple folders), These are folders in my inbox for example; which from that list I select Inbox only.
Filtering the Subject
In this example, I’m expecting emails to come with the subject line of “sample files for email PQ test”, so I filter it this way;
And I put the subject line I’m expecting there;
As a result, I only have emails with that subject line in that specific folder;
If you scroll further right in the columns list in the table above, you will see two columns for attachments; HasAttachment, and Attachments. HasAttachments will have values of TRUE or FALSE saying that this email has attachments or not (which can be also a good filter added to the previous step too, to make sure you only get emails that have an attachment). If you click on the Attachments itself, you will see a table which has all the details about attachments;
Let me show you where this is coming from, this is one of the emails above in my mailbox; (you can see that there is an attachment for it as well, with exactly the same size, name, and extension)
We are only interested in the content of attachments, so let’s expand to this folder, however, first it is better to remove all other columns as we don’t need them anymore;
Then you can expand the Attachments using the expand icon;
You will have now a table of all attachments coming from all emails that we have filtered before;
There are columns such as Name of the attachment file, Extension of it, the size and some other properties. Most importantly, we have the AttachmentContent column, which includes the file itself in the binary format. And that is the column, which we are going to focus on. So let’s remove all other columns.
Now that we have all files in a table column, we can combine them using the Combine Files option.
Power Query, will scan the files, and consider the first one as a template structure for others (remember for this method your files should have the same structure, otherwise you need to create multiple copies of this process into multiple tables using different sets of filters). My files are CSV files, Power Query understand the structure and comes up with a preview of the default structure;
This process will create a custom function behind the scene from one of the files and will loop through it for all files, as the result, I will have a table including the content of all files together;
The dataset in my example needs some cleanup, and it would look like this afterward;
Here you go! You have expanded all attachments directly into Power BI without needing any manual download and save in a shared folder. All through Power BI and Power Query! Before you start implementing it in a real-world scenario, here is an important tip to remember!
Make it Failproof
Building a solution is one thing, and making it failproof is another. There are tons of reasons why a solution might fail somewhere in the future. for example, consider that the next email comes like this:
The email above has some CSV files which are fine, However, it has some other attachments too. For example, the image in the signature! when I refresh the model that I built so far in Power Query, I get this:
In this case, I know the reason is because of the image attachments, and I can go to the Expanded Attachments step in Power Query to find that out;
I can filter the Extension filed to .csv only to make sure I won’t get other types of attachments;
This way, the solution would work fine with no errors.
Power Query can get data from Exchange, which helps you to automate the process of importing attachment’s data directly into Power BI or Excel. You can then use the transformative power of Power Query for filter based on all necessary fields, and expand attachments into a structured table. This method will be a fully automated solution, and the next time that you receive an email with attachments, it will bring them into the output too. However, make sure to build a failproof solution always. This method would also work with shared mailboxes, as long as there is an actual mailbox associated with it.
Resource Credit | RADACAD