Sign in

Transforming a pre-formatted excel report into a tabular format without macros — Reporting system part 2

This article follows on from my previous article https://alexanderamlani24.medium.com/extracting-and-writing-email-attachments-to-xlsx-using-the-ms-graph-api-reporting-system-part-1-82819256b630

In the past when dealing with pre-formatted reports I’ve either had to ask for direct access to the reporting database and when thats not been possible i’ve had to rely on vba/macros or manual intervention to cleanse the report into a format that i could then work with. So I thought this would be a good challenge for my new python skills.

This is the structure of the data that I extracted in the last blog post. I’ve only put the structure of the data and no column names to avoid identification of the supplier.

The first thing I noticed is that there was some hidden files in the folder, now this might have been because I had a file open and a temporary file was created, however I thought I’d write a function to make sure we don’t get anything we dont want in.

The next step is to create a function that will clean each individual file and then concatenate them into one dataframe

The above code cleans the data up so that each table of data for each category in the single report appends into one table and empty cells are filled along with data types are assigned.

We now however need to somehow associate each row in the data frame with the associated category which was above each table in the original report

There is one column which is a sub category in the table which we will use to create a lookup table and then join it on to the main table in the next step

The above code assigns a column that dertmines whether the column string contains a numeric value or not. The categories above the tables all contained numbers and I used this to help me create the category column. As the category column itself fell directly on top of the subcategory column.

The end result is two files one with all reports cleaned into one large data frame and a look up table to match sub-category to major category

We had to derive the sub-category and category this way, as we were not able to manually create a lookup table due to the fact that we do not control any new sub-categories/categories being added

The below code then takes these staging tables and creates a star schema with some additional meta data for the image

The following article will go over the generation of the reports

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store