Re: logical changeset generation v4 - Heikki's thoughts about the patch state

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Steve Singer <steve(at)ssinger(dot)info>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Phil Sorber <phil(at)omniti(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical changeset generation v4 - Heikki's thoughts about the patch state
Date: 2013-01-28 11:17:26
Message-ID: 20130128111726.GC4268@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2013-01-27 23:07:51 -0500, Steve Singer wrote:
> A few more comments;
>
> In decode.c DecodeDelete
>
> + if (r->xl_len <= (SizeOfHeapDelete + SizeOfHeapHeader))
> + {
> + elog(DEBUG2, "huh, no primary key for a delete on wal_level =
> logical?");
> + return;
> + }
> +

> I think we should be passing delete's with candidate key data logged to the
> plugin. If the table isn't a replicated table then ignoring the delete is
> fine. If the table is a replicated table but someone has deleted the unique
> index from the table then the plugin will receive INSERT changes on the
> table but not DELETE changes. If this happens the plugin would have any way
> of knowing that it is missing delete changes. If my plugin gets passed a
> DELETE change record but with no key data then my plugin could do any of

I basically didn't do that because I thought people would forget to
check whether oldtuple is empty I have no problem with addind support
for that though.

> 1. Start screaming for help (ie log errors)

Yes.

> 2. Drop the table from replication

No, you can't write from an output plugin, and I don't immediately see
support for that comming. There's no fundamental blockers, just makes
things more complicated.

> 3. Pass the delete (with no key values) onto the replication client and let
> it deal with it (see 1 and 2)

Hm.

While I agree that nicer behaviour would be good I think the real
enforcement should happen on a higher level, e.g. with event triggers or
somesuch. It seems way too late to do anything about it when we're
already decoding. The transaction will already have committed...

> Also, 'huh' isn't one of our standard log message phrases :)

You're right there ;). I bascially wanted to remove the log message
almost instantly but it was occasionally useful so I kept it arround...

> How do you plan on dealing with sequences?
> I don't see my plugin being called on sequence changes and I don't see
> XLOG_SEQ_LOG listed in DecodeRecordIntoReorderBuffer. Is there a reason why
> this can't be easily added?

I basically was hoping for Simon's sequence-am to get in before doing
anything real here. That didn't really happen yet.
I am not sure whether there's a real usecase in decoding normal
XLOG_SEQ_LOG records, their content isn't all that easy to interpet
unless youre rather familiar with pg's innards.

So, adding support wouldn't hard from a technical pov but it seems the
semantics are a bit hard to nail down.

> Also what do we want to do about TRUNCATE support. I could always leave a
> TRUNCATE trigger in place that logged the truncate to a sl_truncates and
> have my replication daemon respond to the insert on a sl_truncates table
> by actually truncating the data on the replica.

I have planned to add some generic "table_rewrite" handling, but I have
to admit I haven't thought too much about it yet. Currently if somebody
rewrites a table, e.g. with an ALTER ... ADD COLUMN .. DEFAULT .. or
ALTER COLUMN ... USING ..., you will see INSERTs into a temporary
table. That basically seems to be a good thing, but the user needs to be
told about that ;)

> I've spent some time this weekend updating my prototype plugin that
> generates slony 2.2 style COPY output. I have attached my progress here
> (also https://github.com/ssinger/slony1-engine/tree/logical_repl). I have
> not gotten as far as modifying slon to act as a logical log receiver, or
> made a version of the slony apply trigger that would process these
> changes.

I only gave it a quick look and have a couple of questions and
remarks. The way you used the options it looks like youre thinking of
specifying all the tables as options? I would have thought those would
get stored & queried locally and only something like the 'replication
set' name or such would be set as an option.

Iterating over a list with
for(i = 0; i < options->length; i= i + 2 )
{
DefElem * def_schema = (DefElem*) list_nth(options,i);
is not a good idea btw, thats quadratic in complexity ;)

In the REORDER_BUFFER_CHANGE_UPDATE I suggest using
relation->rd_primary, just as in the DELETE case, that should always
give you a consistent candidate key in an efficient manner.

> I haven't looked into the details of what is involved in setting up a
> subscription with the snapshot exporting.

That hopefully shouldn't be too hard... At least thats the idea :P

> I couldn't get the options on the START REPLICATION command to parse so I
> just hard coded some list building code in the init method. I do plan on
> pasing the list of tables to replicate from the replica to the plugin
> (because this list comes from the replica). Passing what could be a few
> thousand table names as a list of arguments is a bit ugly and I admit my
> list processing code is rough. Does this make us want to reconsider the
> format of the option_list ?

Yea, something's screwed up there, sorry. Will push a fix later today.

> I guess should provide an opinion on if I think that the patch in this CF,
> if committed could be used to act as a source for slony instead of the log
> trigger.

> The biggest missing piece I mentioned in my email yesterday, that we aren't
> logging the old primary key on row UPDATEs. I don't see building a credible
> replication system where you don't allow users to update any column of a
> row.

Ok, I really thought this wouldn't be that much of an issue in a first
version, but if you think its important, I'll add support for
it. Shouldn't be too hard.

> The other issues I've raised (DecodeDelete hiding bad deletes, replication
> options not parsing for me) look like easy fixes
>
> no wal decoding support for sequences or truncate are things that I could
> work around by doing things much like slony does today. The SYNC can still
> capture the sequence changes in a table (where the INSERT's would be
> logged) and I can have a trigger capture truncates.

Could you explan a bit what's being done there in slony?

> If this patch is going to get bumped to 9.4 I really hope that someone with
> good knowledge of the internals (ie a committer) can give this patch a good
> review sooner rather than later. If there are issues Andres has overlooked
> that are more serious or complicated to fix I would like to see them raised
> before the next CF in June.

Absolutely seconded. I *really* would love to see a more technical
review, its hard to see issues after spending that much time in a
certain worldview...

Thanks!

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-01-28 11:23:02 Re: logical changeset generation v4
Previous Message Jeevan Chalke 2013-01-28 11:14:32 Re: pg_dump --pretty-print-views