Current State of the Art

The business world decided as follows.

  • A Database should exist just for doing BI & Strategic reports.

  • It should be separated from the operational / transaction database for the day-to-day running of the business.

  • It should encompass all aspects of the business (sales, inventory, hr, customer service…)

  • An enterprise-wide standard definition for every field name in every table.

    • Example: employee number should be identical across DB. empNo, eNo,EmployeeNum.. empID not acceptable.

  • Metadata database (data about data) defining assumptions about each field, describing transformations performed and cleansing operations, etc.

    • Example: If US telephone, it should be nnn-nnn-nnnn or (nnn) nnn-nnnn

  • Data Warehouse is read-only to its end users so that everyone will use the same data, and there will be no mismatch between teams.

  • Fast access, even if it's big data.

----------------------------------------------------------------------------------------

  • Operational databases for tracking sales, inventory, support calls, chat, and email. (Relational and NoSQL)

  • The Back Office team (ETL team) gathers data from multiple sources, cleans it, transforms it, massages the missing, and stores it in the Staging database.

    • If the phone number is not in the format, then format it.

    • If the email address is not linked to the chat/phone record, read it from the Customer and update it.

  • Staging database: Working database where all the work is done to the data. It then dumps to the data warehouse, which is visible as “read-only” to end users.

  • Data Analysts then build reports using Data Warehouse.

We are now going back to the original question.

If all of these things are done right, Amazon's CEO can get the report in less than 30 minutes without interfering with business operations. 👍

Last updated