Data Integration

There is a significant difference between how data is integrated or consumed in the Big Data world and the Traditional database world.

ETL: Extract Transform Load

They are used with traditional RDBMS such as MySQL, SQLServer, and Oracle.

Src: Leanmsbitutorial.com

ETL Tools - Examples

  • SSIS / SSDT: SQL Server Integration Services / SQL Server Data Tools

  • Pentaho Kettle

  • Talend Open Source Data Integrator

  • Benetl for MySQL and PostgreSQL

Img Src: https://towardsdatascience.com/how-i-redesigned-over-100-etl-into-elt-data-pipelines-c58d3a3cb3c

ELT: Extract Load Transform

Newer methodology compared to ETL and is used in the Big data world.

Benefits

  • Ingest anything and everything as soon you get the data.

  • Transform based on your needs.

  • High-Speed loading

  • Low Maintenance to load.

Challenges

  • Security gaps are challenging to monitor as the objective is to load first.

  • Compliance -- Increased pressure on Organizations to follow all regulations such as HIPAA, Do Not Call, Privacy policy, etc.

  • Resource usage -- High storage costs money (On-Prem or Cloud)

Data Flow

Raw data > Cleansed Data > Data Process* > Data Warehousing > ML/BI/Analysis

*Data goes through various pipelines and stages.

Last updated