Undo worker and transaction rollback

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Undo worker and transaction rollback
Date: 2018-10-11 06:00:24
Message-ID: CAFiTN-sYQ8r8ANjWFYkXVfNxgXyLRfvbX9Ee4SxO9ns-OBBgVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, hackers,

In previous threads[1], we proposed patches for generating and storing
undo records and now for undo-worker and the transaction rollback
stuff. The idea is that undo remains relevant as long as the
transaction is in progress and needs to be removed once it becomes
irrelevant. Specifically, for a committed transaction, it remains
relevant until the transaction becomes all-visible; zheap will use
this for MVCC purposes. However, for an aborted transaction, it
remains relevant until the “undo actions" described by the undo
records have been performed. This patch introduces code to discard
undo when it is no longer needed and reuse the associated storage.
Additionally, this patch adds code to execute undo actions. Let me
explain the finer details for each of the cases covered,

Rollback mechanism for undo-based storage

When a transaction is aborted/rolled-back, we need to apply the
corresponding undo-actions to complete the rollback. The undo actions
are applied either by the backend or by a dedicated undo worker. This
decision is based on the size of the transaction and a GUC —
rollback_overflow_size. The aborted transactions exceeding
rollback_overflow_size in size along with their database id are pushed
into the hash table, which is scanned by an undo launcher. Further,
undo launcher spawns undo worker per database. Now, it is the job of
undo worker(s) to connect to that database and perform the
undo-actions accordingly. Note that the undo-worker keeps applying the
undo-actions till it gets undo requests for that specific database,
exit otherwise.

Discarding irrelevant undo

As aforementioned, once a transaction becomes all visible or when it
is aborted the respective undo is considered irrelevant. To reuse the
space occupied by these stale undo we remove them and this process of
removing them is called discarding of undo. This is accomplished by a
dedicated background worker namely discard-worker. Additionally, we
maintain a variable named OldestXidHavingUndo which is the transaction
id of the oldest transaction whose undo is not yet discarded. This
will be required by the zheap for visibility checks and for freezing
the slots of the older transactions but I will not get into its
details for now, as this part will be covered by the zheap storage
engine.

Additionally, we would like to highlight some of the details of the
modified transaction machinery,

- Normally, when rollback or rollback to savepoint requests is fired,
undo actions are applied under the current transaction. But, if there
is an error during the transaction and the user tries to rollback then
undo actions cannot be applied in the same transaction because that
erroneous transaction is aborted. In such a case, a new transaction is
started and undo actions are applied under that fresh transaction.

- For an efficient application of undo actions, we maintain start and
end undo record pointers for each transaction. When adding the
rollback requests to the rollback hash table we only need to push
these undo record pointers. Now, undo worker can directly get to the
undo records using these undo record pointers and apply the respective
undo actions.

This is still a WIP patch and shared to get some early feedback, so
please go ahead and try this out, we are eagerly waiting for your
response. Just an FYI, this work has been extracted from the zheap
branch [2].

This work could not have been in the shape as it is without
valuable inputs from Amit Kapila, be it the design of the feature,
code related to undo action, or his support throughout the project.
Additionally, I’d like to thank Rafia Sabih for working on rollbacks
and discard mechanism of prepared transactions and rollback
hash-table. Certainly, we could not have come this far without the
basic framework of undo-worker which was worked by Mithun. Last but in
no way least, I would like to thank Robert Haas, Thomas Munro and
Andres Freund for design inputs.

[1] https://www.postgresql.org/message-id/CAFiTN-syRxU3jTpkOxHQsEqKC95LGd86JTdZ2stozyXWDUSffg%40mail.gmail.com

[2] https://github.com/EnterpriseDB/zheap

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
0001-Undoworker-and-transaction-rollback.patch application/octet-stream 121.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2018-10-11 07:29:06 Debian mips: Failed test 'Check expected t_009_tbl data on standby'
Previous Message David Rowley 2018-10-11 05:36:35 Re: BUG #15383: Join Filter cost estimation problem in 10.5