Visualising Patterns in Time-Series Data: 2D Overlay Plots

How to create beautiful plots for time-series datasets with periodicity

Csaba Zagoni
Towards Data Science

--

Photo by Isaac Smith on Unsplash

What are 2D Overlay Plots?

Many different species of time-series data exist and choosing the right way to visualise them is not always simple. Data with periodicity can be particularly challenging to visualise as in many cases the periodicity gets lost due to the sheer amount of data on a single plot.

This is not that surprising considering that most humans would find it rather difficult to recognise anything on a 1-dimensional photograph, with the pixels being shown in a single row. However, once the pixels are rearranged in the correct 2-dimensional format that problem magically goes away!

Similarly, applying simple transformations on data with periodicity to make them 2-dimensional can work wonders in visualising complex patterns inherent in the data by presenting them in a way that is conducive to human understanding.

Intentionally, I will not give you a definition of what 2D overlay plots are, nor will I show you one at this point. What I will do though is to walk you through an example below to develop that understanding step-by-step.

The data in the example might have diurnal, weekly and annual patterns. What I’d like you to do is to stop at every plot and ask yourself the following questions:

  • What is the single most important thing that this plot communicates?
  • How clearly are the diurnal, weekly and annual patterns presented by this plot?

An Example Dataset

We are given a dataset with 10,080 records between 1/10/2018 and 31/04/2019. Based on the metadata we can deduce that these are half-hourly internal temperature readings of a building for the duration of a whole heating season.

An Initial Look at the Data

Our first intuition would most likely be the traditional way of plotting the values over time, which yields the following plot:

First intuition: Internal temperature time-series plot (Image by author)

By looking at the plot we can see some periodicity: there are diurnal cycles with relatively low delta T and weekly cycles with larger delta T on some days. The annual pattern (e.g. the minima of the weekly lows) is not that clear.

Note that at this point this is a rather vague and not very useful description of the patterns inherent in data. We think there are diurnal and weekly patterns but we don’t know what they are. However, this give us enough to carry on and guides our second intuition: as the dataset’s values are internal temperature measurements of a building and diurnal / weekly patterns seem to be present, there may be a heating schedule in place for the building.

Hypothesis: if there is a heating schedule behind this dataset that drives building thermal behaviour we should aim for presenting that heating schedule as it is very simple concept to understand.

Visualising Diurnal Patterns

We decide to focus on visualising the diurnal patterns next. One way to do this is to plot the data with the x-axis being the time of day instead of the full timestamp including the date (which is what we did above during our first intuition). Our plot will still be 2-dimensional (x: time of day; y: value) with the individual days overlaid on each other. To be able to do this, we will need two simple transformations to pre-process the data.

First, we need to pivot the data, so that the date part of the timestamp is preserved in the index (i.e. one record holds information for one day) and the time part of the timestamp is preserved in the header (i.e. column names).

Second, we need to transpose the data as by default the individual fields would be overlaid on the plot and we want individual records (i.e. days) to be overlaid.

(You may be wondering why pivot the “wrong way” first if we need to transpose the data anyway in the following step. The reason for this is to adhere to dataset structure best-practices: this way, if we added an extra day’s worth of measurements to our pivoted dataframe we would be adding a new record and not a new field.)

Pandas with Matplotlib support is pretty good at producing simple plots of dataframes so we can rely on it to produce our first attempt 2D overlay plot.

First attempt at 2D overlay plot showing diurnal patterns (Image by author)

Getting the Alpha Right

That doesn’t look too good as there are solid blocks on the plot where the individual lines cannot be followed. The key here is to pass a custom transparency value to the plot that allows individual lines to be visible and not saturate where the plot becomes very busy. Matplotlib uses alpha transparency with 0.0 being fully transparent and 1.0 being fully opaque. In this case 10% alpha works pretty well.

2D overlay plot showing diurnal patterns with custom alpha (Image by author)

Incorporating Domain Knowledge

We can see that 18:00 corresponds with the start of the “heating day” from the heating system’s perspective for most of the days. At any point after this time the heating system can decide to start pre-heating the building to reach the required state at the following point of the heating schedule.

Relying on our domain knowledge we are incorporating the perspective of the system that generated the data into the perspective we want the viewer to have.

2D overlay plot showing diurnal patterns with 18:00 as the start of the “heating day” (Image by author)

Adding More Patterns vs. Maintaining Clarity

Changing the start time of the day made quite an improvement to the clarity of the plot. However, there is still a lot going on on this plot and we shall strive to make it clearer.

Here we have an important decision to make: do we want to emphasise the weekly and / or the annual patterns in the data? We should choose weekly patterns only, for the following reasons:

  1. Heating schedules are defined weekly
  2. There was no clear annual pattern in the data at the first step
  3. We need to maintain clarity: trying to visualise annual patterns as well as weekly patterns we would compromise ease of understanding

Discarding this dimension comes with responsibility though: we need to be conscious of this decision and we have to be able to investigate it later if necessary.

Visualising Diurnal and Weekly Patterns

After a few trials of colour-coding days of the week we come up with the following colour schema. Note that while we have retained information about weekly patterns, we have lost information about annual patterns: e.g. we do know that blue lines represent Mondays but we do NOT know which blue line corresponds to which Monday of the year.

Diurnal and weekly patterns clearly describe a heating schedule (Image by author)

If you are thinking that this is no longer a 2D plot as we are using colour as the 3rd dimension you are quite right. This style of plotting is so different to what we usually consider 3D plots though, that I’m fine with calling this a 2D overlay plot (with a bit of cheating).

I think we have managed to achieve our goal: this image clearly communicates the following heating schedule:

  • Sunday: no heating schedule set
  • Monday: same as Tuesday — Friday but much longer pre-heating time
  • Tuesday — Friday: 09:00–18:00 at 21˚C
  • Saturday: 14:00–16:00 at 21˚C

Also, the code to produce the plot is very simple:

Summary

  • We were given a dataset which at first sight showed some diurnal & weekly periodicity.
  • By applying two simple transformations in the pre-processing step we were able to create a 2d overlay plot that presented the diurnal patterns.
  • Tweaking the alpha value of the plot and incorporating our domain knowledge to adjust the start time made the plot much clearer.
  • Using colour we were able to add the weekly patterns as well to the visualisation.
  • We intentionally discarded the information about the annual patterns but this was a conscious decision to maintain clarity.

Example on Github

You can get the example data and the example notebook on github here:

--

--

Data Scientist at Upside Energy and Environmental Advisor at Greenpeace UK. Strategic guidance & data-driven insight to drive innovation on future of green tech