DUCK DB
DuckDB is a single file built with no dependencies.
All the great features can be read here https://duckdb.org/
Automatic Parallelism: DuckDB has improved its automatic parallelism capabilities, meaning it can more effectively utilize multiple CPU cores without requiring manual tuning. This results in faster query execution for large datasets.
Parquet File Improvements: DuckDB has improved its handling of Parquet files, both in terms of reading speed and support for more complex data types and compression codecs. This makes DuckDB an even better choice for working with large datasets stored in Parquet format.
Query Caching: Improves the performance of repeated queries by caching the results of previous executions. This can be a game-changer for analytics workloads with similar queries being run multiple times.
How to use DuckDB?
Download the CLI Client (Windows / Mac / Linux). For other programming languages, visit https://duckdb.org/docs/installation/
Unzip the file.
Open Command / Terminal and run the Executable.
How does DuckDB help in Data Engineering?
Download orders. parquet from
https://github.com/duckdb/duckdb-data/releases/download/v1.0/orders.parquet
More files are available here https://github.com/cwida/duckdb-data/releases/
Terminal
./duckdb
Duckdb allows you to read the contents of orders.parquet as is without needing a table. Double quotes around the file name orders.parquet is essential.
Not only this, but it also allows you to query the file as-is. (This feature is similar to one data bricks supports)
DuckDB supports CTAS syntax and helps to create tables from the actual file.
DuckDB supports parallel query processing, and queries run fast.
This table has 1.5 million rows, and aggregation happens in less than a second.
DuckDB also helps to convert parquet files to CSV in a snap. It also supports converting CSV to Parquet.
It also supports exporting existing Tables to Parquet files.
DuckDB supports Programming languages such as Python, R, JAVA, node.js, C/C++.
DuckDB ably supports Higher-level SQL programming such as Macros, Sequences, Window Functions.
Get sample data from Yellow Cab
https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
Copy yellow cabs data into yellowcabs folder
Extensions
https://duckdb.org/docs/extensions/overview
Load directly from HTTP location
Last updated