Re: Undo logs

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Undo logs
Date: 2018-08-31 09:38:37
Message-ID: CAFiTN-uVxxopn0UZ64=F-sydbETBbGjWapnBikNo1=Xv78UeFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

As Thomas has already mentioned upthread that we are working on an
undo-log based storage and he has posted the patch sets for the lowest
layer called undo-log-storage.

This is the next layer which sits on top of the undo log storage,
which will provide an interface for prepare, insert, or fetch the undo
records. This layer will use undo-log-storage to reserve the space for
the undo records and buffer management routine to write and read the
undo records.

To prepare an undo record, first, it will allocate required space
using undo_log_storage module. Next, it will pin and lock the required
buffers and return an undo record pointer where it will insert the
record. Finally, it calls the Insert routine for final insertion of
prepared record. Additionally, there is a mechanism for multi-insert,
wherein multiple records are prepared and inserted at a time.

To fetch an undo record, a caller must provide a valid undo record
pointer. Optionally, the caller can provide a callback function with
the information of the block and offset, which will help in faster
retrieval of undo record, otherwise, it has to traverse the undo-chain.

These patch sets will apply on top of the undo-log-storage branch [1],
commit id fa3803a048955c4961581e8757fe7263a98fe6e6.

[1] https://github.com/EnterpriseDB/zheap/tree/undo-log-storage/

undo_interface_v1.patch is the main patch for providing the undo interface.
undo_interface_test_v1.patch is a simple test module to test the undo
interface layer.

On Thu, May 31, 2018 at 4:27 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Hi Simon,
>
> On Mon, May 28, 2018 at 11:40 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On 24 May 2018 at 23:22, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>> The lowest level piece of this work is a physical undo log manager,
>>
>>> 1. Efficient appending of new undo data from many concurrent
>>> backends. Like logs.
>>> 2. Efficient discarding of old undo data that isn't needed anymore.
>>> Like queues.
>>> 3. Efficient buffered random reading of undo data. Like relations.
>>
>> Like an SLRU?
>
> Yes, but with some difference:
>
> 1. There is a variable number of undo logs. Each one corresponds to
> a range of the 64 bit address space, and has its own head and tail
> pointers, so that concurrent writers don't contend for buffers when
> appending data. (Unlike SLRUs which are statically defined, one for
> clog.c, one for commit_ts.c, ...).
> 2. Undo logs use regular buffers instead of having their own mini
> buffer pool, ad hoc search and reclamation algorithm etc.
> 3. Undo logs support temporary, unlogged and permanent storage (=
> local buffers and reset-on-crash-restart, for undo data relating to
> relations of those persistence levels).
> 4. Undo logs storage files are preallocated (rather than being
> extended block by block), and the oldest file is renamed to become the
> newest file in common cases, like WAL.
>
>>> [4] https://github.com/EnterpriseDB/zheap/tree/undo-log-storage/src/backend/access/undo
>>> [5] https://github.com/EnterpriseDB/zheap/tree/undo-log-storage/src/backend/storage/smgr
>>
>> I think there are quite a few design decisions there that need to be
>> discussed, so lets crack on and discuss them please.
>
> What do you think about using the main buffer pool?
>
> Best case: pgbench type workload, discard pointer following closely
> behind insert pointer, we never write anything out to disk (except for
> checkpoints when we write a few pages), never advance the buffer pool
> clock hand, and we use and constantly recycle 1-2 pages per connection
> via the free list (as can be seen by monitoring insert - discard in
> the pg_stat_undo_logs view).
>
> Worst case: someone opens a snapshot and goes out to lunch so we can't
> discard old undo data, and then we start to compete with other stuff
> for buffers, and we hope the buffer reclamation algorithm is good at
> its job (or can be improved).
>
> I just talked about this proposal at a pgcon unconference session.
> Here's some of the feedback I got:
>
> 1. Jeff Davis pointed out that I'm probably wrong about not needing
> FPI, and there must at least be checksum problems with torn pages. He
> also gave me an idea on how to fix that very cheaply, and I'm still
> processing that feedback.
> 2. Andres Freund thought it seemed OK if we have smgr.c routing to
> md.c for relations and undofile.c for undo, but if we're going to
> generalise this technique to put other things into shared buffers
> eventually too (like the SLRUs, as proposed by Shawn Debnath in
> another unconf session) then it might be worth investigating how to
> get md.c to handle all of their needs. They'd all just use fd.c
> files, after all, so it'd be weird if we had to maintain several
> different similar things.
> 3. Andres also suggested that high frequency free page list access
> might be quite contended in the "best case" described above. I'll look
> into that.
> 4. Someone said that segment sizes probably shouldn't be hard coded
> (cf WAL experience).
>
> I also learned in other sessions that there are other access managers
> in development that need undo logs. I'm hoping to find out more about
> that.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
undo_interface_v1.patch application/octet-stream 64.8 KB
undo_interface_test_v1.patch application/octet-stream 5.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2018-08-31 09:49:56 Re: automatic restore point
Previous Message Rafia Sabih 2018-08-31 08:55:20 Re: Hint to set owner for tablespace directory