Optimistic concurrency

Optimistic concurrency is a strategy used in databases and distributed systems to handle concurrent access to shared resources, like a dataset, without requiring locks. Instead of locking resources, optimistic concurrency relies on detecting conflicting changes made by multiple processes or users and resolving them when necessary.

In Spark and Databricks, optimistic concurrency can be applied when dealing with Delta Lake tables, a storage layer built on Apache Spark that provides ACID transactions and other data management capabilities.

Here's a simple example to illustrate optimistic concurrency in Spark Databricks using Delta Lake:

Let's say you have a Delta Lake table called "inventory" with the following schema and data:

| item_id| item_nm | stock  |
+--------+--------+--------+
| 1      | Apple  | 10     |
| 2      | Orange | 20     |
| 3      | Banana | 30     |
+--------+--------+--------+

Imagine two users, UserA and UserB, trying to update the apple stock simultaneously.

User A's update:

UPDATE inventory SET stock = stock + 5 WHERE item_id = 1;

User B's update:

UPDATE inventory SET stock = stock - 3 WHERE item_id = 1;

Using optimistic concurrency, both User A, and User B can execute their updates without waiting for the other to complete. However, after both updates are executed, the system checks for conflicts.

There are no conflicts in this case because the updates are not dependent on each other. So, the final stock of Apples would be 12 (10 + 5 - 3). If there were conflicts, the system would throw an exception, and one of the users would have to retry their transaction.

Optimistic concurrency is beneficial in scenarios where conflicts are rare and lock-based approaches might lead to performance degradation. Allowing concurrent updates without locking can improve throughput and responsiveness in many multi-user and distributed applications.

Last updated