An underutilized technique (in my opinion of course) in the design of type-2 dimensions, and one that I have only started using consistently myself in the past few years, is the concept of a durable key.
What is a “durable key”?
Also called an immutable or persisted key (I like durable better), a durable key is nothing more than a surrogate key (i.e. integer value or nonsensical number) used to identify a dimension member (company, employee, etc.) uniquely in a type-2 dimension. Confusing enough? It’s easier to explain with an example…
There are three different concepts worth elaborating on from the image above:
- Surrogate Key: Usually a system-generated number we assign to identify records uniquely and a common practice when designing a dimensional model. The surrogate key will be the foreign key used in fact tables, and using this method as opposed to potentially storing multiple composite keys helps with the performance of join operations.
- Business Key: The “natural” key used to identify an object/entity in our business application. If we’re lucky this will only be a single attribute, but it’s not impossible to have multiple attributes that uniquely define an entity in a line-of-business system. In our example here, the CompanyId guid is used to identify a company in our source system.
- Durable Key: Similar to a surrogate key, but this key doesn’t change for different versions of the same entity in a type-2 dimension.
In the example above, we use the surrogate key of the initial record (573) as durable key for all subsequent versions of the company. You could of course use something else if you’d like, but using that first surrogate key is the easiest to implement in an ETL process while ensuring that it isn’t shared with another entity in your dimension table.
Why use durable keys?
Let’s assume that your model does not have a durable key in the Company dimension, and that you have a fact table (EmployeeTime) that tracks the time that employees spend against company activities. The fact table looks something like this…
The CompanyKey attribute in our fact table (above) will allow us to retrieve the version of the company record at the time of the activity, but what if we wanted to be able to associate our fact records (activities in this case) with the most recent version of the company? How would we do that? There’s a few different ways I can think of:
- Store the business key of the company (CompanyId) in our fact table.
- This may seem like a viable option at first and seeing that we only have one attribute as business key, but what if the business key was a composite key with multiple attributes? We would have to store all of those attributes in the fact table, which completely negates the reason for using surrogate keys in the first place. Not a very good dimensional design…
- Create a query with multiple joins.
- We would first have to join to the Company dimension in order to get the business key, and then join to it a second time to get the most recent version. Not very efficient in terms of performance, especially if you plan on doing this repeatedly for multiple reports.
- Create and maintain a separate type-1 dimension.
- You would create a CompanyHistory dimension which will track history and a Company dimension that will be a type-1 dimension with only the most recent version of a company. In the fact table you will have the surrogate keys of both dimension tables, giving you the option to use either one depending on the need.
- This option works pretty well and I’ve used it in the past, but it does create some overhead in terms of the ETL process, maintenance and storage. The amount of overhead will increase as you add more type-2 dimensions (and their type-1 counterparts).
All of these options are a little clunky or require quite a bit of work down the line. Let’s see what happens when I add the durable key to the fact table and how it simplifies our lives:
With both of the keys in the fact table, we can use the CompanyKey as join criteria if we need the version of the company at the time of the activity, or use the DurableCompanyKey in combination with the CurrentRecord attribute from the Company dimension to get the most recent version of the company. That query will look like this:
I’ve had many requests in the past from end users who would like to see the most recent version of an entity (company in this case) in a specific report, while seeing the tracked version in others. Think about an invoice report where you would only like to see the most recent address of a company, as opposed to a regional revenue analysis where the address of the company at the time of the transaction is important.
As you can see, the addition of the durable key and a few minor changes to our ETL process will make that possible while keeping our star-schema intact, and give us the best of both worlds while preserving the benefits of surrogate keys in the dimensional model. A useful technique to consider when you are designing type-2 dimensions, especially if you are not sure how the end users will want to view the data in different reports or analytical platforms.