SCD - Type 2
Using the retail store example with the "Product" dimension, let's see how a Type 2 Slowly Changing Dimension approach would handle the change in the product category.
Recall that the Product dimension table contains the following columns:
Product_ID
Product_Name
Category
Price
For a Type 2 SCD, we will need to add two additional columns to the table:
Start_Date
End_Date (or a flag indicating the current record, such as "Is_Current")
Now, imagine that the store decides to re-categorize the "Smartphone" product from the "Electronics" category to the "Mobile Devices" category, as in the previous examples.
With a Type 2 SCD approach, the store will add a new row with the updated category information while retaining the old row in the table. Each row will have a start and end date, indicating the period during which the attribute values were valid.
Before the change, the Product dimension table looks like this:
1
Smartphone
Electronics
800
2021-01-01
NULL
After the change, the Product dimension table will look like this:
1
Smartphone
Electronics
800
2021-01-01
2023-04-03
2
Smartphone
Mobile Devices
800
2023-04-04
NULL
With a Type 2 SCD approach, the table now has a new row reflecting the updated category for the "Smartphone" product.
Additionally, the previous row for the "Smartphone" product with the "Electronics" category is still in the table, with an updated End_Date.
This approach allows for the preservation of historical data and enables accurate reporting and analysis based on the product's category at different points in time.
Use Cases:
When historical data is critical for analysis, and it's essential to track changes over time for reporting and decision-making.
For dimensions where understanding the impact of changes on business performance, such as customer demographics, product attributes, or pricing, is vital.
In cases where the data warehouse is required to support reporting and analysis that includes historical trends, comparisons, and the effect of changes over time.
Advantages:
Historical Data Preservation: SCD Type 2 maintains a complete history of changes in the dimension attributes, enabling more accurate and detailed reporting and analysis.
Accurate Analysis: By preserving historical data, SCD Type 2 allows for accurate analysis and reporting that accounts for changes over time, leading to better insights and decision-making.
Change Tracking: SCD Type 2 enables tracking of when and why changes occurred, making it easier to identify patterns, trends, or potential issues in the dimension data.
Disadvantages:
Complexity: SCD Type 2 is more complex to implement and manage compared to SCD Type 0 and Type 1, as it requires additional mechanisms to handle changes in dimension attributes and maintain historical data.
Space Requirements: Since multiple versions of records are stored to maintain historical data, the dimension tables will be larger and require more storage space.
Performance: As there are additional rows for historical data, the querying and processing of the data warehouse may be slower compared to SCD Type 0 or Type 1, especially when dealing with large amounts of historical data. This may require more robust indexing, partitioning, or query optimization strategies to maintain acceptable performance.
Last updated