Data Warehousing
  • Data Warehousing
  • Readme
  • Fundamentals
    • Terms to Know
    • Jobs
    • Skills needed for DW developer
    • Application Tiers
    • Operational Database
    • What is a Data Warehouse
      • Typical Data Architecture
      • Problem Statement
      • Features of Data Warehouse
      • Need for Data Warehouse
      • Current State of the Art
    • Activities of Data Science
    • Types of Data
    • Data Storage Systems
    • Data Warehouse 1980 - Current
    • Data Warehouse vs Data Mart
    • Data Warehouse Architecture
      • Top-Down Approach
      • Bottom-Up Approach
    • Data Warehouse Characteristic
      • Subject Oriented
      • Integrated
      • Time Variant
      • Non Volatile
    • Tools
    • Cloud vs On-Premise
    • Steps to design a Data Warehouse
      • Gather Requirements
      • Environment
      • Data Modeling
      • Choosing ETL / ELT Solution
      • Online Analytic Processing
      • Front End
      • Query Optimization
    • Dataset Examples
    • Thoughts on some data
  • RDBMS
    • Data Model
      • Entity Relationship Model
      • Attributes
      • Keys
      • Transaction
      • ACID
    • Online vs Batch
    • DSL vs GPL
    • Connect to Elvis
    • SQL Concepts
      • Basic Select - 1
      • Basic Select - 2
      • UNION Operators
      • Wild Cards & Distinct
      • Group By & Having
      • Sub Queries
      • Derived Tables
      • Views
    • Practice using SQLBolt
  • Cloud
    • Overview
    • Types of Cloud Services
    • Challenges of Cloud Computing
    • AWS
      • AWS Global Infrastructure
      • EC2
      • S3
      • IAM
    • Terraform
  • Spark - Databricks
    • Storage Formats
    • File Formats
    • Medallion Architecture
    • Delta
  • Data Warehousing Concepts
    • Dimensional Modelling
      • Star Schema
      • Galaxy Schema
      • Snowflake Schema
      • Starflake Schema
      • Star vs Snowflake
      • GRAIN
      • Multi-Fact Star Schema
      • Vertabelo Tool
    • Dimension - Fact
    • Sample Excercise
    • Keys
      • Why Surrogate Keys are Important
    • More Examples
    • Master Data Management
    • Steps of Dimensional Modeling
    • Types of Dimensions
      • Date Dimension Table
      • Degenerate Dimension
      • Junk Dimension
      • Static Dimension
      • Conformed Dimensions
      • Slowly Changing Dimensions
        • SCD - Type 0
        • SCD - Type 1
        • SCD - Type 2
        • SCD - Type 3
        • SCD - Type 4
        • SCD - Type 6
        • SCD - Type 5 - Fun Fact
      • Role Playing Dimension
      • Conformed vs Role Playing
      • Shrunken Dimension
      • Swappable Dimension
      • Step Dimension
    • Types of Facts
      • Factless Fact Table
      • Transaction Fact
      • Periodic Fact
      • Accumulating Snapshot Fact Table
      • Transaction vs Periodic vs Accumulating
      • Additive, Semi-Additive, Non-Additive
      • Periodic Snapshot vs Additive
      • Conformed Fact
    • Sample Data Architecture Diagram
    • Data Pipeline Models
    • New DW Concepts
Powered by GitBook
On this page
  • What is the issue now ?
  • How to Overcome this issue?
  • Properties of Surrogate Keys
  • Adv of Surrogate Keys
  1. Data Warehousing Concepts
  2. Keys

Why Surrogate Keys are Important

PreviousKeysNextMore Examples

Last updated 2 years ago

Lets have a sample data

StoreID *
Street
City
State
Country

S1001

24th Blvd

Phoenix

AZ

USA

S1002

21 Bell Road

Miami

FL

USA

S1003

Main Street

New Port

CA

USA

StoreID is Natural Key (PK) It has a meaning S stands for Store 1001 means its the first store.

What is the issue now ?

When data changes (Slowly Changing Dimension we will cover in next chapter) how to handle the change.

Situation 1: Store 1 moves to new location or store is closed for sometime and opened under new franchise.

StoreID *
Street
City
State
Country

S1001

24th Blvd

Phoenix

AZ

USA

S1002

21 Bell Road

Miami

FL

USA

S1003

Main Street

New Port

CA

USA

S1001

1st Street

Phoenix

AZ

USA

Situation 2: When acquiring competitive business (say Target buys KMart), the Natural Keys dont make sense now.

StoreID *
Street
City
State
Country

S1001

24th Blvd

Phoenix

AZ

USA

S1002

21 Bell Road

Miami

FL

USA

S1003

Main Street

New Port

CA

USA

233

South Street

New Brunswick

NJ

USA

1233

JFK Blvd

Charlotte

NC

USA

These business decisions / changes have nothing to do with the Technology.

How to Overcome this issue?

Add Surrogate Keys (running sequence number)

Surr_Store
StoreID *
Street
City
State
Country

1

S1001

24th Blvd

Phoenix

AZ

USA

2

S1002

21 Bell Road

Miami

FL

USA

3

S1003

Main Street

New Port

CA

USA

4

233

South Street

New Brunswick

NJ

USA

5

1233

JFK Blvd

Charlotte

NC

USA

Properties of Surrogate Keys

- Numerical
- Sequential
- Meaningless Simple Number

Adv of Surrogate Keys

- Constant Behavior (will not change based on Business need)
- Integration is easier.
- Faster Query Performance. (because of Integer values)
- Future records (every other column can be NULL still ID is available)

Its a good practice to have Surrogate Key in DataWarehouse Dimension & Fact tables.

Example of Primary / Foreign Keys