Big Data & Tools with NoSQL
  • Big Data & Tools
  • ReadMe
  • Big Data Overview
    • Overview
    • Job Opportunities
    • What is Data?
    • How does it help?
    • Types of Data
    • The Big 4 V's
      • Variety
      • Volume
      • Velocity
      • Veracity
      • Other V's
    • Trending Technologies
    • Big Data Concerns
    • Big Data Challenges
    • Data Integration
    • Scaling
      • CAP Theorem
      • Optimistic concurrency
      • Eventual consistency
      • Concurrent vs. Parallel Programming
    • Big Data Tools
    • No SQL Databases
    • What does Big Data learning means?
  • Linux & Tools
    • Overview
    • Linux Commands - 01
    • Linux Commands - 02
    • AWK
    • CSVKIT
    • CSVSQL
    • CSVGREP
  • Data Format
    • Storage Formats
    • CSV/TSV/Parquet
    • Parquet Example
    • JSON
    • HTTP & REST API
      • Terms to Know
        • Statefulness
        • Statelessness
        • Monolithic Architecture
        • Microservices
        • Idempotency
    • REST API
    • Python
      • Setup
      • Decorator
      • Unit Testing
      • Flask Demo
      • Flask Demo - 01
      • Flask Demo - 02
      • Flask Demo - 03
      • Flask Demo - 04
      • Flask Demo - 06
    • API Testing
    • Flask Demo Testing
    • API Performance
    • API in Big Data World
  • NoSQL
    • Types of NoSQL Databases
    • Redis
      • Overview
      • Terms to know
      • Redis - (RDBMS) MySql
      • Redis Cache Demo
      • Use Cases
      • Data Structures
        • Strings
        • List
        • Set
        • Hash
        • Geospatial Index
        • Pub/Sub
        • Redis - Python
      • Redis JSON
      • Redis Search
      • Persistence
      • Databases
      • Timeseries
    • Neo4J
      • Introduction
      • Neo4J Terms
      • Software
      • Neo4J Components
      • Hello World
      • Examples
        • MySQL: Neo4J
        • Sample Transactions
        • Sample
        • Create Nodes
        • Update Nodes
        • Relation
        • Putting it all together
        • Commonly used Functions
        • Data Profiling
        • Queries
        • Python Scripts
      • More reading
    • MongoDB
      • Sample JSON
      • Introduction
      • Software
      • MongoDB Best Practices
      • MongoDB Commands
      • Insert Document
      • Querying MongoDB
      • Update & Remove
      • Import
      • Logical Operators
      • Data Types
      • Operators
      • Aggregation Pipeline
      • Further Reading
      • Fun Task
        • Sample
    • InfluxDB
      • Data Format
      • Scripts
  • Python
    • Python Classes
    • Serialization-Deserialization
  • Tools
    • JQ
    • DUCK DB
    • CICD Intro
    • CICD Tools
      • CI YAML
      • CD Yaml
    • Containers
      • VMs or Containers
      • What container does
      • Podman
      • Podman Examples
  • Cloud Everywhere
    • Overview
    • Types of Cloud Services
    • Challenges of Cloud Computing
    • High Availability
    • Azure Cloud
      • Services
      • Storages
      • Demo
    • Terraform
  • Data Engineering
    • Batch vs Streaming
    • Kafka
      • Introduction
      • Kafka Use Cases
      • Kafka Software
      • Python Scripts
      • Different types of Streaming
    • Quality & Governance
    • Medallion Architecture
    • Data Engineering Model
    • Data Mesh
  • Industry Trends
    • Roadmap - Data Engineer
    • Good Reads
      • IP & SUBNET
Powered by GitBook
On this page
  1. Python

Serialization-Deserialization

Serialization converts a data structure or object state into a format that can be stored, transmitted, and reconstructed later.

Deserialization is the reverse process, where the stored or transmitted data is used to recreate the original data structure or object state.

(Python/Scala/Rust) Objects to JSON back to Objects (Python/Scala/Rust...)

The analogy of translating from Spanish to English (Universal Language) and to German

JSON is a lightweight format for storing and transporting data, easy for humans to read and write and for machines to parse and generate.

Avro is a binary serialization format designed for serializing complex data structures efficiently and compactly and is often used within big data applications.

  1. Compact and Efficient: Avro uses binary serialization, making it more compact and efficient than text-based formats like JSON. This results in faster data processing and reduced storage needs.

  2. Schema Evolution: Avro supports schema evolution. Adding, removing, or changing fields while maintaining backward and forward compatibility. This makes it easier to evolve your data model over time without breaking existing systems.

  3. Rich Data Structures: It supports various primitive and complex data types, including nested and recursive. This makes it suitable for complex data representation.

  4. Fast Serialization and Deserialization: Avro's binary format allows for faster data serialization and deserialization, which is crucial for high-performance computing tasks.

  5. Integration with Big Data Tools: Avro is well-integrated with several big data technologies like Apache Hadoop, Apache Kafka, and Apache Spark, making it a popular choice for data serialization in big data ecosystems.

  6. Language Independent: Avro can be used in various programming languages, making it a versatile choice for systems that involve multiple languages.

  7. Self-Describing Format: Avro data is always accompanied by its schema, allowing any program that receives it to read it without knowing the schema in advance. This self-describing nature facilitates easier data processing and exchange between systems.

Schemas

An Avro schema defines the structure of the Avro data format. It's a JSON document that describes your data types and protocols, ensuring that even complex data structures are adequately represented. The schema is crucial for data serialization and deserialization, allowing systems to interpret the data correctly.

Example of Avro Schema

{
  "type": "record",
  "name": "Person",
  "namespace": "com.example",
  "fields": [
    {"name": "firstName", "type": "string"},
    {"name": "lastName", "type": "string"},
    {"name": "age", "type": "int"},
    {"name": "email", "type": ["null", "string"], "default": null}
  ]
}

Here is the list of Primitive Types which Avro supports:

  • null: no value

  • boolean: a binary value

  • int: 32-bit signed integer

  • long: 64-bit signed integer

  • float: single precision (32-bit) IEEE 754 floating-point number

  • double: double precision (64-bit) IEEE 754 floating-point number

  • bytes: the sequence of 8-bit unsigned bytes

  • string: Unicode character sequence

There are six kinds of complex data types which Avro supports :

  • Records

  • Enums

  • Arrays

  • Maps

  • Unions

  • Fixed

git clone https://github.com/gchandra10/serialization_deserialization.git

PreviousPython ClassesNextJQ

Last updated 1 year ago