Data Warehousing
  • Data Warehousing
  • Readme
  • Fundamentals
    • Terms to Know
    • Jobs
    • Skills needed for DW developer
    • Application Tiers
    • Operational Database
    • What is a Data Warehouse
      • Typical Data Architecture
      • Problem Statement
      • Features of Data Warehouse
      • Need for Data Warehouse
      • Current State of the Art
    • Activities of Data Science
    • Types of Data
    • Data Storage Systems
    • Data Warehouse 1980 - Current
    • Data Warehouse vs Data Mart
    • Data Warehouse Architecture
      • Top-Down Approach
      • Bottom-Up Approach
    • Data Warehouse Characteristic
      • Subject Oriented
      • Integrated
      • Time Variant
      • Non Volatile
    • Tools
    • Cloud vs On-Premise
    • Steps to design a Data Warehouse
      • Gather Requirements
      • Environment
      • Data Modeling
      • Choosing ETL / ELT Solution
      • Online Analytic Processing
      • Front End
      • Query Optimization
    • Dataset Examples
    • Thoughts on some data
  • RDBMS
    • Data Model
      • Entity Relationship Model
      • Attributes
      • Keys
      • Transaction
      • ACID
    • Online vs Batch
    • DSL vs GPL
    • Connect to Elvis
    • SQL Concepts
      • Basic Select - 1
      • Basic Select - 2
      • UNION Operators
      • Wild Cards & Distinct
      • Group By & Having
      • Sub Queries
      • Derived Tables
      • Views
    • Practice using SQLBolt
  • Cloud
    • Overview
    • Types of Cloud Services
    • Challenges of Cloud Computing
    • AWS
      • AWS Global Infrastructure
      • EC2
      • S3
      • IAM
    • Terraform
  • Spark - Databricks
    • Storage Formats
    • File Formats
    • Medallion Architecture
    • Delta
  • Data Warehousing Concepts
    • Dimensional Modelling
      • Star Schema
      • Galaxy Schema
      • Snowflake Schema
      • Starflake Schema
      • Star vs Snowflake
      • GRAIN
      • Multi-Fact Star Schema
      • Vertabelo Tool
    • Dimension - Fact
    • Sample Excercise
    • Keys
      • Why Surrogate Keys are Important
    • More Examples
    • Master Data Management
    • Steps of Dimensional Modeling
    • Types of Dimensions
      • Date Dimension Table
      • Degenerate Dimension
      • Junk Dimension
      • Static Dimension
      • Conformed Dimensions
      • Slowly Changing Dimensions
        • SCD - Type 0
        • SCD - Type 1
        • SCD - Type 2
        • SCD - Type 3
        • SCD - Type 4
        • SCD - Type 6
        • SCD - Type 5 - Fun Fact
      • Role Playing Dimension
      • Conformed vs Role Playing
      • Shrunken Dimension
      • Swappable Dimension
      • Step Dimension
    • Types of Facts
      • Factless Fact Table
      • Transaction Fact
      • Periodic Fact
      • Accumulating Snapshot Fact Table
      • Transaction vs Periodic vs Accumulating
      • Additive, Semi-Additive, Non-Additive
      • Periodic Snapshot vs Additive
      • Conformed Fact
    • Sample Data Architecture Diagram
    • Data Pipeline Models
    • New DW Concepts
Powered by GitBook
On this page
  • Structured Data
  • Semi-Structured Data
  1. Fundamentals

Types of Data

  • Structured Data (rows/columns CSV, Excel)

  • Semi-Structured Data (JSON / XML)

  • Unstructured Data (Video, Audio, Document, Email)

Structured Data

ID
Name
Join Date

101

Rachel Green

2020-05-01

201

Joey Tribianni

1998-07-05

301

Monica Geller

1999-12-14

401

Cosmo Kramer

2001-06-05

Semi-Structured Data

JSON

[
   {
      "id":1,
      "name":"Rachel Green",
      "gender":"F",
      "series":"Friends"
   },
   {
      "id":"2",
      "name":"Sheldon Cooper",
      "gender":"M",
      "series":"BBT"
   }
]

XML

<?xml version="1.0" encoding="UTF-8"?>
<actors>
   <actor>
      <id>1</id>
      <name>Rachel Green</name>
      <gender>F</gender>
      <series>Friends</series>
   </actor>

   <actor>
      <id>2</id>
      <name>Sheldon Cooper</name>
      <gender>M</gender>
      <series>BBT</series>
   </actor>
</actors>

Unstructured Data

  1. Text Logs: Server logs, application logs.

  2. Social Media Posts: Tweets, Facebook comments.

  3. Emails: Customer support interactions.

  4. Audio/Video: Customer call recordings and marketing videos.

  5. Customer Reviews: Free-form text reviews.

  6. Images: Product images user profile pictures.

  7. Documents: PDFs, Word files.

  8. Sensor Data: IoT data streams.

These can be ingested into modern data warehouses for analytics, often after some preprocessing. For instance, text can be analyzed with NLP before storing, or images can be processed into feature vectors.

PreviousActivities of Data ScienceNextData Storage Systems

Last updated 1 year ago