Re: [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader
Date: 2012-06-18 13:23:07
Message-ID: 201206181523.07985.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, June 18, 2012 01:51:45 PM Heikki Linnakangas wrote:
> On 15.06.2012 00:38, Andres Freund wrote:
> > On Thursday, June 14, 2012 11:19:00 PM Heikki Linnakangas wrote:
> >> On 13.06.2012 14:28, Andres Freund wrote:
> >>> Features:
> >>> - streaming reading/writing
> >>> - filtering
> >>> - reassembly of records
> >>>
> >>> Reusing the ReadRecord infrastructure in situations where the code that
> >>> wants to do so is not tightly integrated into xlog.c is rather hard and
> >>> would require changes to rather integral parts of the recovery code
> >>> which doesn't seem to be a good idea.
> >>
> >> It would be nice refactor ReadRecord and its subroutines out of xlog.c.
> >> That file has grown over the years to be really huge, and separating the
> >> code to read WAL sounds like it should be a pretty natural split. I
> >> don't want to duplicate all the WAL reading code, so we really should
> >> find a way to reuse that. I'd suggest rewriting ReadRecord into a thin
> >> wrapper that just calls the new xlogreader code.
> >
> > I aggree that it is not very nice to duplicate it. But I also don't want
> > to go the route of replacing ReadRecord with it for a while, we can
> > replace ReadRecord later if we want. As long as it is in flux like it is
> > right now I don't really see the point in investing energy in it.
> > Also I am not that sure how a callback oriented API fits into the xlog.c
> > workflow?
>
> If the API doesn't fit the xlog.c workflow, I think that's a pretty good
> indication that the API is not good. The xlog.c workflow is basically:
>
> while(!end-of-wal)
> {
> record = ReadRecord();
> redo(recode);
> }
>
> There's some pretty complicated logic within the bowels of ReadRecord
> (see XLogPageRead(), for example), but it seems to me those parts
> actually fit the your callback API pretty well. The biggest change is
> that the xlog-reader thingie should return to the caller after each
> record, instead of calling a callback for each read record and only
> returning at end-of-wal.
I did it that way at start but it seemed to move too much knowledge to the
callsites.
Its also something of an efficiency/complexity thing: Either you alway reread
the current page on entering XLogReader (because it could have changed if it
was the last one at some point) which is noticeable performancewise or you
make the contract more complex by requiring to notify the XLogReader about the
fact that you changed the end-of-valid data which doesn't seem to be a good
idea because I think we will rather quickly will grow more callsites.

> Or we could put the code to call the redo-function into the callback,
> but there's some also some logic there to exit early if you reach the
> desired recovery point, for example, so the callback API would need to
> be extended to allow such early exit. I think a return-after-each-record
> API would be better, though.
Adding an early exit would be easy.

> >> Can this be used for writing WAL, as well as reading? If so, what do you
> >> need the write support for?
> >
> > It currently can replace records which are not interesting (e.g. index
> > changes in the case of logical rep). Filtered records are replaced with
> > XLOG_NOOP records with correct length currently. In future the actual
> > amount of data should really be reduced. I don't know yet know how to
> > map LSNs of uncompressed/compressed stream onto each other...
> > The filtered data is then passed to a writeout callback (in a streaming
> > fashion).
> >
> > The whole writing out part is pretty ugly at the moment and I just bolted
> > it ontop because it was convenient for the moment. I am not yet sure how
> > the api for that should look....
>
> Can we just leave that out for now?
Not easily I think. I should probably get busy writing up a PoC for separating
that.

Thanks,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Farina 2012-06-18 13:54:14 Re: Standbys, txid_current_snapshot, wraparound
Previous Message Tom Lane 2012-06-18 12:30:46 Re: REVIEW: Optimize referential integrity checks (todo item)