Site icon Martin's Blog

The value of durable keys in type-2 dimensions

An underutilized technique (in my opinion of course) in the design of type-2 dimensions, and one that I have only started using consistently myself in the past few years, is the concept of a durable key.

What is a “durable key”?

Also called an immutable or persisted key (I like durable better), a durable key is nothing more than a surrogate key (i.e. integer value or nonsensical number) used to identify a dimension member (company, employee, etc.) uniquely in a type-2 dimension. Confusing enough? It’s easier to explain with an example…

There are three different concepts worth elaborating on from the image above:

In the example above, we use the surrogate key of the initial record (573) as durable key for all subsequent versions of the company. You could of course use something else if you’d like, but using that first surrogate key is the easiest to implement in an ETL process while ensuring that it isn’t shared with another entity in your dimension table.

Why use durable keys?

Let’s assume that your model does not have a durable key in the Company dimension, and that you have a fact table (EmployeeTime) that tracks the time that employees spend against company activities. The fact table looks something like this…

The CompanyKey attribute in our fact table (above) will allow us to retrieve the version of the company record at the time of the activity, but what if we wanted to be able to associate our fact records (activities in this case) with the most recent version of the company? How would we do that? There’s a few different ways I can think of:

All of these options are a little clunky or require quite a bit of work down the line. Let’s see what happens when I add the durable key to the fact table and how it simplifies our lives:

With both of the keys in the fact table, we can use the CompanyKey as join criteria if we need the version of the company at the time of the activity, or use the DurableCompanyKey in combination with the CurrentRecord attribute from the Company dimension to get the most recent version of the company. That query will look like this:

I’ve had many requests in the past from end users who would like to see the most recent version of an entity (company in this case) in a specific report, while seeing the tracked version in others. Think about an invoice report where you would only like to see the most recent address of a company, as opposed to a regional revenue analysis where the address of the company at the time of the transaction is important.

As you can see, the addition of the durable key and a few minor changes to our ETL process will make that possible while keeping our star-schema intact, and give us the best of both worlds while preserving the benefits of surrogate keys in the dimensional model. A useful technique to consider when you are designing type-2 dimensions, especially if you are not sure how the end users will want to view the data in different reports or analytical platforms.

Exit mobile version