Quick Links

Re: WAL format and API changes (9.5)

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: WAL format and API changes (9.5)
Date:	2014-08-12 23:36:59
Message-ID:	53EAA51B.2020105@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 08/13/2014 01:04 AM, Andres Freund wrote:
> On 2014-08-12 12:44:22 +0300, Heikki Linnakangas wrote:
>> Here's a new patch with those little things fixed.
>
> I've so far ignored this thread... I'm now taking a look. Unfortunately
> I have to admit I'm not unequivocally happy.
>
> * XLogRegisterData/XLogRegisterRecData imo don't really form a
> consistent API namewise. What does 'Rec' mean here?

The latter function is actually called XLogRegisterBufData. The README
was wrong, will fix.

> * I'm distinctly not a fan of passing around both the LSN and the struct
> XLogRecord to functions. We should bit the bullet and use something
> encapsulating both.

You mean, the redo functions? Seems fine to me, and always it's been
like that...

> * XLogReplayBuffer imo isn't a very good name - the buffer isn't
> replayed. If anything it's sometimes replaced by the backup block, but
> the real replay still happens outside of that function.

I agree, but haven't come up with a better name.

> * Why do we need XLogBeginInsert(). Right now this leads to uglyness
> like duplicated if (RelationNeedsWAL()) wal checks and
> XLogCancelInsert() of inserts that haven't been started.

I like clearly marking the beginning of the insert, with
XLogBeginInsert(). We certainly could design it so that it's not needed,
and you could just start registering stuff without calling
XLogBeginInsert() first. But I believe it's more robust this way. For
example, you will get an error at the next XLogBeginInsert(), if a
previous operation had already registered some data without calling
XLogInsert().

> And if we have Begin, why do we also need Cancel?

We could just automatically "cancel" any previous insertion when
XLogBeginInsert() is called, but again it seems like bad form to leave
references to buffers and data just lying there, if an operation bails
out after registering some stuff and doesn't finish the insertion.

> * "XXX: do we need to do this for every page?" - yes, and it's every
> tuple, not every page... And It needs to be done *after* the tuple has
> been put on the page, not before. Otherwise the location will be wrong.

That comment is in heap_multi_insert(). All the inserted tuples have the
same command id, seems redundant to log multiple NEW_CID records for them.

> * The patch mixes the API changes around WAL records with changes of how
> individual actions are logged. That makes it rather hard to review -
> and it's a 500kb patch already.
>
> I realize it's hard to avoid because the new API changes which
> information about blocks is available, but it does make it really hard
> to see whether the old/new code is doing something
> equivalent. It's hard to find more critical code than this :/

Yeah, I hear you. I considered doing this piecemeal, just adding the new
functions first so that you could still use the old XLogRecData API,
until all the functions have been converted. But in the end, I figured
it's not worth it, as sooner or later we'd want to convert all the
functions anyway.

> Have you done any performance evaluation?

No, not yet.

- Heikki

In response to

Re: WAL format and API changes (9.5) at 2014-08-12 22:04:22 from Andres Freund

Responses

Re: WAL format and API changes (9.5) at 2014-08-13 11:07:46 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2014-08-13 02:04:38	Re: Inconsistent use of --slot/-S in pg_receivexlog and pg_recvlogical
Previous Message	Stephen Frost	2014-08-12 23:26:51	Re: Proposal: Incremental Backup