In-place updates and serializable transactions

From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Kevin Grittner <kgrittn(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql(at)j-davis(dot)com, joe(dot)conway(at)crunchydata(dot)com
Subject: In-place updates and serializable transactions
Date: 2018-11-14 04:45:43
Message-ID: CAGz5QCJzreUqJqHeXrbEs6xb0zCNKBHhOj6D9Tjd3btJTzydxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

Currently, we're working on the serializable implementations for
zheap. As mentioned in README-SSI documentation[1], there is one
difference in SSI implementation of PostgreSQL that can differentiate
the conflict detection behaviour with other storage engines that
supports updates in-place.

The heap storage in PostgreSQL does not use "update in place" with a
rollback log for its MVCC implementation. Where possible it uses
"HOT" updates on the same page (if there is room and no indexed value
is changed). For non-HOT updates the old tuple is expired in place and
a new tuple is inserted at a new location. Because of this
difference, a tuple lock in PostgreSQL doesn't automatically lock any
other versions of a row. We can take the following example from the
doc to understand the situation in more detail:

T1 ---rw---> T2 ---ww--->T3

If transaction T1 reads a row version (thus acquiring a predicate lock
on it) and a second transaction T2 updates that row version (thus
creating a rw-conflict graph edge from T1 to T2), must a third
transaction T3 which re-updates the new version of the row also have a
rw-conflict in from T1 to prevent anomalies? In other words, does it
matter whether we recognize the edge T1 --rw--> T3? The document also
includes a nice proof for why we don't try to copy or expand a tuple
lock to any other versions of the row or why we don't have to
explicitly recognize the edge T1 --rw--> T3.

In PostgreSQL, the predicate locking is implemented using the tuple
id. In zheap, since we perform updates in-place, we don't change the
tuple id. So, in the above example, we easily recognize the edge
T1--rw--> T3. This may increase the number of false positives for
certain cases. In the above example, if we introduce another
transaction T4 such that T3 --rw--> T4 and T4 gets committed first,
for zheap, T3 will be rolled back because of the dangerous structure
T1 --rw--> T3 --rw--> T4. But, for heap, T3 can be committed(isolation
test case [2]). IMHO, this seems to be an acceptable behavior.

In brief, due to in-place updates, in some cases, the false positives
may increase for serializable transactions. Any thoughts?

[1] src/backend/storage/lmgr/README-SSI
[2] src/test/isolation/specs/multiple-row-versions.spec
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-11-14 04:50:26 Re: DSM segment handle generation in background workers
Previous Message Amit Langote 2018-11-14 04:30:27 Re: Speeding up INSERTs and UPDATEs to partitioned tables