|From:||Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>|
|To:||Simon Riggs <simon(at)2ndquadrant(dot)com>|
|Cc:||Pg Hackers <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: Undo logs|
|Views:||Raw Message | Whole Thread | Download mbox|
On Mon, May 28, 2018 at 11:40 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 24 May 2018 at 23:22, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> The lowest level piece of this work is a physical undo log manager,
>> 1. Efficient appending of new undo data from many concurrent
>> backends. Like logs.
>> 2. Efficient discarding of old undo data that isn't needed anymore.
>> Like queues.
>> 3. Efficient buffered random reading of undo data. Like relations.
> Like an SLRU?
Yes, but with some difference:
1. There is a variable number of undo logs. Each one corresponds to
a range of the 64 bit address space, and has its own head and tail
pointers, so that concurrent writers don't contend for buffers when
appending data. (Unlike SLRUs which are statically defined, one for
clog.c, one for commit_ts.c, ...).
2. Undo logs use regular buffers instead of having their own mini
buffer pool, ad hoc search and reclamation algorithm etc.
3. Undo logs support temporary, unlogged and permanent storage (=
local buffers and reset-on-crash-restart, for undo data relating to
relations of those persistence levels).
4. Undo logs storage files are preallocated (rather than being
extended block by block), and the oldest file is renamed to become the
newest file in common cases, like WAL.
>>  https://github.com/EnterpriseDB/zheap/tree/undo-log-storage/src/backend/access/undo
>>  https://github.com/EnterpriseDB/zheap/tree/undo-log-storage/src/backend/storage/smgr
> I think there are quite a few design decisions there that need to be
> discussed, so lets crack on and discuss them please.
What do you think about using the main buffer pool?
Best case: pgbench type workload, discard pointer following closely
behind insert pointer, we never write anything out to disk (except for
checkpoints when we write a few pages), never advance the buffer pool
clock hand, and we use and constantly recycle 1-2 pages per connection
via the free list (as can be seen by monitoring insert - discard in
the pg_stat_undo_logs view).
Worst case: someone opens a snapshot and goes out to lunch so we can't
discard old undo data, and then we start to compete with other stuff
for buffers, and we hope the buffer reclamation algorithm is good at
its job (or can be improved).
I just talked about this proposal at a pgcon unconference session.
Here's some of the feedback I got:
1. Jeff Davis pointed out that I'm probably wrong about not needing
FPI, and there must at least be checksum problems with torn pages. He
also gave me an idea on how to fix that very cheaply, and I'm still
processing that feedback.
2. Andres Freund thought it seemed OK if we have smgr.c routing to
md.c for relations and undofile.c for undo, but if we're going to
generalise this technique to put other things into shared buffers
eventually too (like the SLRUs, as proposed by Shawn Debnath in
another unconf session) then it might be worth investigating how to
get md.c to handle all of their needs. They'd all just use fd.c
files, after all, so it'd be weird if we had to maintain several
different similar things.
3. Andres also suggested that high frequency free page list access
might be quite contended in the "best case" described above. I'll look
4. Someone said that segment sizes probably shouldn't be hard coded
(cf WAL experience).
I also learned in other sessions that there are other access managers
in development that need undo logs. I'm hoping to find out more about
|Next Message||David Rowley||2018-05-31 01:34:19||Re: New GUC to sample log queries|
|Previous Message||Michael Paquier||2018-05-30 20:25:33||Re: PostgreSQL 11 beta1 on AIX 7.2 : 2 failures in 32bit mode|