Data Warehousing
  • Data Warehousing
  • Readme
  • Fundamentals
    • Terms to Know
    • Jobs
    • Skills needed for DW developer
    • Application Tiers
    • Operational Database
    • What is a Data Warehouse
      • Typical Data Architecture
      • Problem Statement
      • Features of Data Warehouse
      • Need for Data Warehouse
      • Current State of the Art
    • Activities of Data Science
    • Types of Data
    • Data Storage Systems
    • Data Warehouse 1980 - Current
    • Data Warehouse vs Data Mart
    • Data Warehouse Architecture
      • Top-Down Approach
      • Bottom-Up Approach
    • Data Warehouse Characteristic
      • Subject Oriented
      • Integrated
      • Time Variant
      • Non Volatile
    • Tools
    • Cloud vs On-Premise
    • Steps to design a Data Warehouse
      • Gather Requirements
      • Environment
      • Data Modeling
      • Choosing ETL / ELT Solution
      • Online Analytic Processing
      • Front End
      • Query Optimization
    • Dataset Examples
    • Thoughts on some data
  • RDBMS
    • Data Model
      • Entity Relationship Model
      • Attributes
      • Keys
      • Transaction
      • ACID
    • Online vs Batch
    • DSL vs GPL
    • Connect to Elvis
    • SQL Concepts
      • Basic Select - 1
      • Basic Select - 2
      • UNION Operators
      • Wild Cards & Distinct
      • Group By & Having
      • Sub Queries
      • Derived Tables
      • Views
    • Practice using SQLBolt
  • Cloud
    • Overview
    • Types of Cloud Services
    • Challenges of Cloud Computing
    • AWS
      • AWS Global Infrastructure
      • EC2
      • S3
      • IAM
    • Terraform
  • Spark - Databricks
    • Storage Formats
    • File Formats
    • Medallion Architecture
    • Delta
  • Data Warehousing Concepts
    • Dimensional Modelling
      • Star Schema
      • Galaxy Schema
      • Snowflake Schema
      • Starflake Schema
      • Star vs Snowflake
      • GRAIN
      • Multi-Fact Star Schema
      • Vertabelo Tool
    • Dimension - Fact
    • Sample Excercise
    • Keys
      • Why Surrogate Keys are Important
    • More Examples
    • Master Data Management
    • Steps of Dimensional Modeling
    • Types of Dimensions
      • Date Dimension Table
      • Degenerate Dimension
      • Junk Dimension
      • Static Dimension
      • Conformed Dimensions
      • Slowly Changing Dimensions
        • SCD - Type 0
        • SCD - Type 1
        • SCD - Type 2
        • SCD - Type 3
        • SCD - Type 4
        • SCD - Type 6
        • SCD - Type 5 - Fun Fact
      • Role Playing Dimension
      • Conformed vs Role Playing
      • Shrunken Dimension
      • Swappable Dimension
      • Step Dimension
    • Types of Facts
      • Factless Fact Table
      • Transaction Fact
      • Periodic Fact
      • Accumulating Snapshot Fact Table
      • Transaction vs Periodic vs Accumulating
      • Additive, Semi-Additive, Non-Additive
      • Periodic Snapshot vs Additive
      • Conformed Fact
    • Sample Data Architecture Diagram
    • Data Pipeline Models
    • New DW Concepts
Powered by GitBook
On this page
  1. Fundamentals
  2. What is a Data Warehouse

Current State of the Art

The business world decided as follows.

  • A Database should exist just for doing BI & Strategic reports.

  • It should be separated from the operational / transaction database for the day-to-day running of the business.

  • It should encompass all aspects of the business (sales, inventory, hr, customer service…)

  • An enterprise-wide standard definition for every field name in every table.

    • Example: employee number should be identical across DB. empNo, eNo,EmployeeNum.. empID not acceptable.

  • Metadata database (data about data) defining assumptions about each field, describing transformations performed and cleansing operations, etc.

    • Example: If US telephone, it should be nnn-nnn-nnnn or (nnn) nnn-nnnn

  • Data Warehouse is read-only to its end users so that everyone will use the same data, and there will be no mismatch between teams.

  • Fast access, even if it's big data.

----------------------------------------------------------------------------------------

  • Operational databases for tracking sales, inventory, support calls, chat, and email. (Relational and NoSQL)

  • The Back Office team (ETL team) gathers data from multiple sources, cleans it, transforms it, massages the missing, and stores it in the Staging database.

    • If the phone number is not in the format, then format it.

    • If the email address is not linked to the chat/phone record, read it from the Customer and update it.

  • Staging database: Working database where all the work is done to the data. It then dumps to the data warehouse, which is visible as “read-only” to end users.

  • Data Analysts then build reports using Data Warehouse.

We are now going back to the original question.

If all of these things are done right, Amazon's CEO can get the report in less than 30 minutes without interfering with business operations. 👍

PreviousNeed for Data WarehouseNextActivities of Data Science

Last updated 2 years ago