Data Warehousing
  • Data Warehousing
  • Readme
  • Fundamentals
    • Terms to Know
    • Jobs
    • Skills needed for DW developer
    • Application Tiers
    • Operational Database
    • What is a Data Warehouse
      • Typical Data Architecture
      • Problem Statement
      • Features of Data Warehouse
      • Need for Data Warehouse
      • Current State of the Art
    • Activities of Data Science
    • Types of Data
    • Data Storage Systems
    • Data Warehouse 1980 - Current
    • Data Warehouse vs Data Mart
    • Data Warehouse Architecture
      • Top-Down Approach
      • Bottom-Up Approach
    • Data Warehouse Characteristic
      • Subject Oriented
      • Integrated
      • Time Variant
      • Non Volatile
    • Tools
    • Cloud vs On-Premise
    • Steps to design a Data Warehouse
      • Gather Requirements
      • Environment
      • Data Modeling
      • Choosing ETL / ELT Solution
      • Online Analytic Processing
      • Front End
      • Query Optimization
    • Dataset Examples
    • Thoughts on some data
  • RDBMS
    • Data Model
      • Entity Relationship Model
      • Attributes
      • Keys
      • Transaction
      • ACID
    • Online vs Batch
    • DSL vs GPL
    • Connect to Elvis
    • SQL Concepts
      • Basic Select - 1
      • Basic Select - 2
      • UNION Operators
      • Wild Cards & Distinct
      • Group By & Having
      • Sub Queries
      • Derived Tables
      • Views
    • Practice using SQLBolt
  • Cloud
    • Overview
    • Types of Cloud Services
    • Challenges of Cloud Computing
    • AWS
      • AWS Global Infrastructure
      • EC2
      • S3
      • IAM
    • Terraform
  • Spark - Databricks
    • Storage Formats
    • File Formats
    • Medallion Architecture
    • Delta
  • Data Warehousing Concepts
    • Dimensional Modelling
      • Star Schema
      • Galaxy Schema
      • Snowflake Schema
      • Starflake Schema
      • Star vs Snowflake
      • GRAIN
      • Multi-Fact Star Schema
      • Vertabelo Tool
    • Dimension - Fact
    • Sample Excercise
    • Keys
      • Why Surrogate Keys are Important
    • More Examples
    • Master Data Management
    • Steps of Dimensional Modeling
    • Types of Dimensions
      • Date Dimension Table
      • Degenerate Dimension
      • Junk Dimension
      • Static Dimension
      • Conformed Dimensions
      • Slowly Changing Dimensions
        • SCD - Type 0
        • SCD - Type 1
        • SCD - Type 2
        • SCD - Type 3
        • SCD - Type 4
        • SCD - Type 6
        • SCD - Type 5 - Fun Fact
      • Role Playing Dimension
      • Conformed vs Role Playing
      • Shrunken Dimension
      • Swappable Dimension
      • Step Dimension
    • Types of Facts
      • Factless Fact Table
      • Transaction Fact
      • Periodic Fact
      • Accumulating Snapshot Fact Table
      • Transaction vs Periodic vs Accumulating
      • Additive, Semi-Additive, Non-Additive
      • Periodic Snapshot vs Additive
      • Conformed Fact
    • Sample Data Architecture Diagram
    • Data Pipeline Models
    • New DW Concepts
Powered by GitBook
On this page
  • Super Key
  • Candidate Key
  • Primary Key
  • Alternate Key
  • Surrogate Key
  • Unique Key
  • Natural Key
  • Foreign Key
  1. Data Warehousing Concepts

Keys

PreviousSample ExcerciseNextWhy Surrogate Keys are Important

Last updated 2 years ago

Again what is Primary Key? Unique - Not Null - One PK per table.

Super Key - A super key is a set or one or more columns (attributes) to identify rows in a table uniquely.

Candidate Key - Minimal super keys with no redundant attributes to uniquely identify a row.

Primary Key - Primary key is selected from the sets of candidate keys.

Alternate Key – A candidate key that was not selected as Primary Key.

Natural Key – A key that happens naturally in the table. Example: SSN, TaxID.

Surrogate Key – A system-generated identifier (Meaningless Simple Number).

Unique Key – Unique values one NULL.

Foreign Key – used to relate other tables.

Super Key

Super Key is a set of one or more columns (attributes) to identify rows in a table uniquely.

For the above table, a bag of keys will help us uniquely identify that table's rows.

{ID}
{EmpNum}
{SSN}
{Email}
{ID,EmpNum}
{ID,SSN}
{ID,Email}
{ID,DOB}
{EmpNum,SSN}
{EmpNum,SSN,Email}
{ID, EmpNum, SSN}
{ID, EmpNum, SSN, DOB}
{ID, EmpNum, SSN, DOB, Name}
{ID, EmpNum, SSN, Name, Email,DOB}
...

Now you get the idea.. we can come up with more & more combinations.

Candidate Key

"Minimal super keys" with no redundant attributes to uniquely identify a row.

Here the essential condition is Minimal. Let's see what minimal keys help us identify a row uniquely.

With just ID, we can uniquely identify a row. Similarly, EmpNum, SSN, and Email can uniquely pull a row.

{ID}

{EmpNum}

{SSN}

{Email}

Primary Key

The primary key is selected from the sets of candidate keys. Let's choose the best column as Primary Key through the process of elimination.

SSN: Sensitive
Email: Varchar
EmpNum: Good to use when needed.
ID: Auto Increment. Widely used in Data Warehousing.

Alternate Key

A candidate key should have been selected as Primary Key.

Email is best suited for Alternate Key. Email can be used to search particular employees if the system doesn’t have access to EmpNum. It is used to link to External accounts.

Surrogate Key

Surrogate Key: A system-generated identifier (Meaningless Simple Number)

ID column matches the definition.

Surrogate Keys are very handy and useful in Data Warehousing.

Unique Key

Unique values may have one NULL value.

Under certain circumstances, Email can be a unique key. Assuming email is generated by a different system and it can be null for a brief period.

Natural Key

A natural key is a column or set of columns that already exist in the table (e.g., they are attributes of the entity within the data model) and uniquely identifies a record in the table. Since these columns are entity attributes, they have business meaning.

Example: SSN

Foreign Key

It is used to refer to other tables.