SCD - Type 3

SCD Type 3 involves adding a new column to the dimension table to store the previous value of the changed attribute along with the current value. It allows tracking the current and previous values but needs to maintain a complete history of changes. Using the same "Product" dimension example with the mobile product category change:

Recall that the Product dimension table contains the following columns:

  1. Product_ID

  2. Product_Name

  3. Category

  4. Price

For a Type 3 SCD, we will need to add a column to the table:

  1. Previous_Category

Now, imagine that the store decides to re-categorize the "Smartphone" product from the "Electronics" category to the "Mobile Devices" category, as in the previous examples.

With a Type 3 SCD approach, the store will update the existing row by setting the "Previous_Category" column to the old category value and overwriting the "Category" column with the new value.

Before the change, the Product dimension table looks like this:

Product_IDProduct_NameCategoryPricePrevious_Category

1

Smartphone

Electronics

800

NULL

After the change, the Product dimension table will look like this:

Product_IDProduct_NameCategoryPricePrevious_Category

1

Smartphone

Mobile Devices

800

Electronics

With a Type 3 SCD approach, the table now reflects the updated category for the "Smartphone" product and also retains the previous category in the "Previous_Category" column. However, it can only track one previous value and does not maintain a complete history of all changes that occurred over time.

Use Cases:

  1. When it is essential to track the immediate previous value of an attribute, but a complete history of changes is not required.

  2. For dimensions where the primary focus is on comparing the current value with the previous value, rather than analyzing historical trends.

  3. In cases where the data warehouse is required to support reporting and analysis that involves comparisons between current and previous states, but not a full history of changes.

Advantages:

  1. Limited Historical Data: SCD Type 3 preserves the immediate previous value of an attribute, allowing for some level of historical analysis and comparison with the current value.

  2. Space Efficiency: Since only one previous value is stored, the dimension tables will require less storage space compared to SCD Type 2.

  3. Simplicity: SCD Type 3 is relatively simple to implement, as it only requires adding a new column to the dimension table and updating it when changes occur.

Disadvantages:

  1. Incomplete Historical Data: SCD Type 3 does not maintain a full history of changes, which may limit the depth of historical analysis and reporting that can be performed.

  2. Limited Change Tracking: With SCD Type 3, it is only possible to track the immediate previous value of an attribute, making it difficult to understand trends or patterns in the data over time.

  3. Additional Columns: SCD Type 3 requires adding a new column for each attribute that needs to track the previous value, which can increase the complexity of the dimension table schema.

  4. Scalability: If there are multiple attributes that require tracking of previous values or if the number of changes becomes more frequent, SCD Type 3 may become less practical and harder to manage. In such cases, SCD Type 2 may be a more suitable approach to maintain a complete history of changes.

Suppose the university decides to change the credit hours of a specific course, "Introduction to Data Science," from 3 to 4 credit hours. The university wants to track both the current and the immediately previous credit hours for reporting purposes.

Using SCD Type 3, the university would update the course's record in the Course dimension table by overwriting the "Credit_Hours" column and updating the "Previous_Credit_Hours" column with the old value.

Before the change, the Course dimension table looks like this:

Course_IDCourse_NameCredit_HoursPrevious_Credit_Hours

1

Introduction to Data Science

3

NULL

After the change, the Course dimension table will look like this:

Course_IDCourse_NameCredit_HoursPrevious_Credit_Hours

1

Introduction to Data Science

4

3

With a Type 3 SCD approach, the table now reflects the updated credit hours for the "Introduction to Data Science" course and retains the previous credit hours in the "Previous_Credit_Hours" column. This allows the university to compare the current and previous credit hours, which might be useful for understanding recent changes in course workload. However, it does not maintain a complete history of all credit hour changes over time.

Last updated