SCD - Type 4

SCD Type 4, the "history table" approach, involves creating a separate table to store historical data, while the main dimension table only retains current information. Using the "Product" dimension example with the mobile product category change:

Recall that the Product dimension table contains the following columns:

  1. Product_ID

  2. Product_Name

  3. Category

  4. Price

For a Type 4 SCD, we will create an additional table called "Product_History":

Product_History Table:

  1. History_ID

  2. Product_ID

  3. Category

  4. Price

  5. Valid_From

  6. Valid_To

Now, imagine that the store decides to re-categorize the "Smartphone" product from the "Electronics" category to the "Mobile Devices" category.

With a Type 4 SCD approach, the store will insert a new row in the "Product_History" table, representing the previous state of the "Smartphone" product, and update the existing row in the main "Product" dimension table with the new category.

Before the change, the tables look like this:

Product Dimension Table:

Product_ID
Product_Name
Category
Price

1

Smartphone

Electronics

800

Product_History Table:

(empty)

After the change, the tables will look like this:

Product Dimension Table:

Product_ID
Product_Name
Category
Price

1

Smartphone

Mobile Devices

800

Product_History Table:

History_ID
Product_ID
Category
Price
Valid_From
Valid_To

1

1

Electronics

800

2020-01-01

2023-01-01

With a Type 4 SCD approach, the main "Product" dimension table contains only the current information, and the "Product_History" table maintains the history of changes. This approach helps maintain a clean dimension table and allows for efficient querying of current data while still preserving historical data for analysis when required.

Use Cases:

  1. When it is important to maintain a complete history of changes, but the main dimension table should only contain the current state of the data.

  2. For situations where querying the current data efficiently is a priority, but historical data is still required for more in-depth analysis.

  3. When the dimension table is frequently accessed or queried for current data, and performance is a concern.

Advantages:

  1. Performance: By keeping the current data in the main dimension table and historical data in a separate table, SCD Type 4 allows for efficient querying of the current data without the overhead of historical data.

  2. Historical Data: SCD Type 4 maintains a complete history of changes, enabling in-depth analysis and reporting that requires historical data.

  3. Separation of Concerns: By separating the current and historical data, SCD Type 4 provides a cleaner and more organized data structure, making it easier to manage and maintain.

  4. Scalability: Since historical data is stored separately, SCD Type 4 can scale well with large dimensions and frequent changes.

Disadvantages:

  1. Complexity: SCD Type 4 adds complexity to the data warehouse design and maintenance, as it requires managing two separate tables for the same dimension.

  2. Increased Storage: Storing historical data in a separate table requires additional storage space, which can be a concern for large dimensions with extensive change history.

  3. Maintenance: Implementing and maintaining SCD Type 4 can be more challenging than other SCD types, as it requires managing the relationships between the dimension and history tables and ensuring data integrity between them.

  4. Query Complexity: Analyzing historical data and comparing it with current data can involve more complex queries, as it may require joining the dimension and history tables.

Overall, SCD Type 4 is suitable for scenarios where maintaining a complete history of changes is necessary, but the main dimension table should only contain the current state of the data to optimize query performance.

Last updated