If you are a dimensional modeling purist, you may want to get your preferred drink of choice (make it a double) before you continue reading this post. It will definitely challenge your way of thinking, but not in a bad way…I hope.
The word “temporal” simply means “of or relating to time“, and labeling a fact table in such a way may seem redundant or strange…or both. This term however describes a very specific implementation of a fact table with characteristics of a type-2 dimension, as you will see below.
First some Credits
I attended a session at the PASS Summit, presented by Davide Mauri (blog; twitter). He aptly coined the term “Temporal Snapshot Fact Table” to describe this particular type of design approach, and he deserves a lot of credit for having the courage to present such a difficult topic. It is not just a difficult topic to conceptualize and grasp, but also to present.
I left that session wondering if this topic is really well understood by BI professionals, mainly because I found very little (if any) information or examples out there. Hence the reason for this blog post…
I have been involved in a similar type of design implementation for the past 2 years, and hopefully this post will provide some clear information to help you identify when and how to use this kind of fact table. Note that I have omitted the word “Snapshot” from the title…which is probably the only difference of opinion I have to Davide’s given title. In my opinion this is not a snapshot table, because the fact record is not only written once but updated as many times as necessary.
Setting the Scene
Facts are essentially transactions which happen at a specific point in time. They usually do not change after being recorded…I know there are exceptions, but in most cases it will be a simple update if any.
But what if you do not have traditional facts? What if you have to track certain properties of a “dimension” over time which happens to have factual attributes? What if the changes to those properties are too volatile to incorporate them in the dimension table…or even a snapshot fact table?
Let me explain in more detail…
I am involved in a price modeling project for a large retailer. The “pricing tool” allows the business to create rules for certain categories of products, or any level of the product hierarchy to be more precise. These complex rules are then used to recommend what they should be asking for their products. Sales data is not relevant here…we are simply applying different levels of rules to determine what we should be asking for the products.
In order to apply all the relevant rules, we need to track certain attributes of the products over time. Some of the attributes we need to track are:
- Retail price
- Coupon/Discount details (i.e. buy 2, get one free)
- Movement (how much of the product are they selling)
The typical question we need to answer is: “What are all the attributes of a product on any given day?”.
Up to this point you should still be saying: “Type-2 dimension!!!“…so let’s assume we’ve decided to track all of the above in a type-2 product dimension.
The following will make us think twice about the choice we’ve just made:
- Products are priced (differently) according to regions or stores. This means that we will have to include region or store in the dimension as well, and that will change the grain (level of detail) of the dimension. Our 800k product dimension just grew exponentially by the amount of stores/regions, and we are not even tracking any changes yet.
- Coupons or discounts usually last for a week only, and in any given week there are many products on sale. Again we are adding an exponential factor of growth to our dimension, which will make it impossible to manage.
A single product dimension would be a really bad choice:
- (800k products * 200 stores * coupons) * changes over time = BILLIONS of records!!!
The next argument is that we should have separate dimensions for products, stores/regions and coupons. That should really have been our first course of action, but for the sake of clarification I started with a simplified (and most probably incorrect) approach.
Creating our separate dimensions, we end up with something similar to the following…
Our initial design for the dimensions look good and uncle Ralph will be proud, but we still have to tie it all together somehow. And this is where everything becomes interesting really fast. The first “spanner in the works” is that we don’t have traditional facts.
If we did (in the case of product sales for instance), the typical star schema would have been relatively straight forward:
But unfortunately we do not have a transaction that happens at a point in time. Instead, we are dealing with volatile attributes which are changing rapidly over time.
A solution that comes to mind is a periodic snapshot fact table, but again we will have to deal with too many records. For a snapshot fact table to work in this scenario we will have to take daily snapshots for current and future changes, as well as keep the history in case they want to model on historic attributes.
The snapshot fact table will have a similar outcome to the very first approach where we attempted to track all the changes in a single type-2 dimension. There will be too many records to make it a feasible option.
The solution to all of this is astoundingly simple…if you are prepared to step aside from the beaten track for a little while. If we add effective and termination dates to our fact table, we will be able to satisfy all of the requirements.
It will give us the ability to track all attribute changes over time, and we should be able to return the given attributes of any product on any given day. Record counts would also be manageable because we are only generating new records when something changes.
The star-schema design will look something like this:
You will notice that the coupon dimension has disappeared from the above schema. In our case at least, it turned out that we were only interested in the measures (i.e. coupon amount) originally contained within that dimension. Depending on your specific requirements, you may want to keep it as a separate dimension…be aware though that something like a coupon dimension could be a monster dimension with potentially hundreds of millions of records.
The temporal fact table almost seems too simple to be a viable solution. It is nothing more than a fact table with the characteristics of a type-2 dimension, more specifically the inclusion of effective and termination dates. It works like a dream when used appropriately, and provides a sustainable solution to this type of business requirement.
As with many other things in life though, there is of course a price to pay when choosing a temporal fact table as part of your design:
- The design in itself is pretty simple, but your ETL transformations become very complex when tracking multiple changing attributes like this. Records have to be updated multiple times, and you have to be really careful that you don’t create overlapping date ranges. In some instances you may even have to recreate the entire history of an entity (product in this example). In our project we were also dealing with future records, and that created another layer of complexity – more records to possibly update in other words.
- A type-2 change in any associated dimension will also drive a change in the fact table. You will be forced to do this in order to keep everything in sync. Stay away from using type-2 dimensions if at all possible…if the requirements allow for that of course.
- Building a cube will require some extra work if you want to include the day-level of your date dimension. Writing a query and using the “between” operator is relatively easy, but building the cube will require that you have a reference for every day in the date range of your fact record. This challenge can be overcome by creating a bridge table (or helper table if you prefer) between the temporal fact and date dimension which resolves the many-to-many relationship.
Concatenating the effective and termination date keys could be a good choice for the “Bridge_Table_Key” field above (i.e. 20110125 and 20110315 becomes 2011012520110315). This will eliminate the need for multiple records in the bridge table to represent the same unique date range.
Do not be alarmed by the amount of records in the bridge table. It is a very narrow table, and with proper indexing the performance should be adequate. And of course you could join the fact table directly to the date dimension for anything that does not require day-level information.
As BI environments evolve, we may just end up seeing more of this “hybrid-type” design when traditional facts are absent and attributes are too volatile for snapshots.
Hopefully the scenario and explanation above will provide some food for thought…even for the purists at heart, which I am actually one of. Use this approach with caution…you will burn your fingers (and maybe more) if implemented in the wrong type of situation.