Re: logical changeset generation v4 - Heikki's thoughts about the patch state

From: Steve Singer <steve(at)ssinger(dot)info>
To: Steve Singer <steve(at)ssinger(dot)info>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Phil Sorber <phil(at)omniti(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical changeset generation v4 - Heikki's thoughts about the patch state
Date: 2013-01-28 04:07:51
Message-ID: BLU0-SMTP200858157B53BBC0F5F658DC180@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13-01-24 11:15 AM, Steve Singer wrote:
> On 13-01-24 06:40 AM, Andres Freund wrote:
>> Fair enough. I am also working on a user of this infrastructure but that
>> doesn't help you very much. Steve Singer seemed to make some stabs at
>> writing an output plugin as well. Steve, how far did you get there?
> I was able to get something that generated output for INSERT
> statements in a format similar to what a modified slony apply trigger
> would want. This was with the list of tables to replicate hard-coded
> in the plugin. This was with the patchset from the last commitfest.I
> had gotten a bit hung up on the UPDATE and DELETE support because
> slony allows you to use an arbitrary user specified unique index as
> your key. It looks like better support for tables with a unique
> non-primary key is in the most recent patch set. I am hoping to have
> time this weekend to update my plugin to use parameters passed in on
> the init and other updates in the most recent version. If I make some
> progress I will post a link to my progress at the end of the weekend.
> My big issue is that I have limited time to spend on this.

A few more comments;

In decode.c DecodeDelete

+ if (r->xl_len <= (SizeOfHeapDelete + SizeOfHeapHeader))
+ {
+ elog(DEBUG2, "huh, no primary key for a delete on wal_level =
+ return;
+ }

I think we should be passing delete's with candidate key data logged to
the plugin. If the table isn't a replicated table then ignoring the
delete is fine. If the table is a replicated table but someone has
deleted the unique index from the table then the plugin will receive
INSERT changes on the table but not DELETE changes. If this happens the
plugin would have any way of knowing that it is missing delete changes.
If my plugin gets passed a DELETE change record but with no key data
then my plugin could do any of
1. Start screaming for help (ie log errors)
2. Drop the table from replication
3. Pass the delete (with no key values) onto the replication client and
let it deal with it (see 1 and 2)

Also, 'huh' isn't one of our standard log message phrases :)

How do you plan on dealing with sequences?
I don't see my plugin being called on sequence changes and I don't see
XLOG_SEQ_LOG listed in DecodeRecordIntoReorderBuffer. Is there a reason
why this can't be easily added?

Also what do we want to do about TRUNCATE support. I could always leave
a TRUNCATE trigger in place that logged the truncate to a sl_truncates
and have my replication daemon respond to the insert on a sl_truncates
table by actually truncating the data on the replica.

I've spent some time this weekend updating my prototype plugin that
generates slony 2.2 style COPY output. I have attached my progress here
(also I
have not gotten as far as modifying slon to act as a logical log
receiver, or made a version of the slony apply trigger that would
process these changes. I haven't looked into the details of what is
involved in setting up a subscription with the snapshot exporting.

I couldn't get the options on the START REPLICATION command to parse so
I just hard coded some list building code in the init method. I do plan
on pasing the list of tables to replicate from the replica to the plugin
(because this list comes from the replica). Passing what could be a
few thousand table names as a list of arguments is a bit ugly and I
admit my list processing code is rough. Does this make us want to
reconsider the format of the option_list ?

I guess should provide an opinion on if I think that the patch in this
CF, if committed could be used to act as a source for slony instead of
the log trigger.

The biggest missing piece I mentioned in my email yesterday, that we
aren't logging the old primary key on row UPDATEs. I don't see building
a credible replication system where you don't allow users to update any
column of a row.

The other issues I've raised (DecodeDelete hiding bad deletes,
replication options not parsing for me) look like easy fixes

no wal decoding support for sequences or truncate are things that I
could work around by doing things much like slony does today. The SYNC
can still capture the sequence changes in a table (where the INSERT's
would be logged) and I can have a trigger capture truncates.

I mostly did this review from the point of view of someone trying to use
the feature, I haven't done a line-by-line review of the code.

I suspect Andres can address these issues and get an updated patch out
during this CF. I think a more detailed code review by someone more
familiar with postgres internals will reveal a handful of other issues
that hopefully can be fixed without a lot of effort. If this were the
only patch in the commitfest I would encourage Andres to push to get
these changes done. If the standard for CF4 is that a patch needs to be
basically in a commitable state at the start of the CF, other than minor
issues, then I don't think this patch meets that bar. In a few more
weeks from now, with a handful of more updates and re-reviews it might.
If we give everyone in the CF that much time to get their patches into a
committable state then I think the CF will drag on until April or even
May and we might not see 9.3 released until close to Christmas (4
patches so far have been rejected or returned with feedback, 51 need
reviewer or committer attention) . I'm not sure I have a huge problem
with that but I don't think it is what was agreed to in the developer
meeting last May.

If this patch is going to get bumped to 9.4 I really hope that someone
with good knowledge of the internals (ie a committer) can give this
patch a good review sooner rather than later. If there are issues
Andres has overlooked that are more serious or complicated to fix I
would like to see them raised before the next CF in June.


>>> BTW, why does all the transaction reordering stuff has to be in core?
>> It didn't use to, but people argued pretty damned hard that no undecoded
>> data should ever allowed to leave the postgres cluster. And to be fair
>> it makes writing an output plugin *way* much easier. Check
>> If you skip over tuple_to_stringinfo(), which is just pretty generic
>> scaffolding for converting a whole tuple to a string, writing out the
>> changes in some format by now is pretty damn simple.
> I think we will find that the replication systems won't be the only
> users of this feature. I have often seen systems that have a logging
> requirement for auditing purposes or to log then reconstruct the
> sequence of changes made to a set of tables in order to feed a
> downstream application. Triggers and a journaling table are the
> traditional way of doing this but it should be pretty easy to write a
> plugin to accomplish the same thing that should give better
> performance. If the reordering stuff wasn't in core this would be
> much harder.
>>> How much of this infrastructure is to support replicating DDL
>>> changes? IOW,
>>> if we drop that requirement, how much code can we slash?
>> Unfortunately I don't think too much unless we add in other code that
>> allows us to check whether the current definition of a table is still
>> the same as it was back when the tuple was logged.
>>> Any other features or requirements that could be dropped? I think
>>> it's clear at this stage that
>>> this patch is not going to be committed as it is. If you can reduce
>>> it to a
>>> fraction of what it is now, that fraction might have a chance.
>>> Otherwise,
>>> it's just going to be pushed to the next commitfest as whole, and we're
>>> going to be having the same doubts and discussions then.
>> One thing that reduces complexity is to declare the following as
>> unsupported:
>> - CREATE TABLE foo(data text);
>> - INSERT INTO foo(data)
>> VALUES(very-long-to-be-externally-toasted-tuple);
>> - DROP TABLE foo;
>> but thats just a minor thing.
>> I think what we can do more realistically than to chop of required parts
>> of changeset extraction is to start applying some of the preliminary
>> patches independently:
>> - the relmapper/relfilenode changes + pg_relation_by_filenode(spc,
>> relnode) should be independently committable if a bit boring
>> - allowing walsenders to connect to a database possibly needs an
>> interface change
>> but otherwise it should be fine to go in independently. It also has
>> other potential use-cases, so I think thats fair.
>> - logging xl_running_xact's more frequently could also be committed
>> independently and makes sense independently as it allows a standby to
>> enter HS faster if the master is busy
>> - Introducing InvalidCommandId should be relatively uncontroversial. The
>> fact that no invalid value for command ids exists is imo an oversight
>> - the *Satisfies change could be applied and they are imo ready but
>> there's no use-case for it without the rest, so I am not sure whether
>> theres a point
>> - currently not separately available, but we could add wal_level=logical
>> independently. There would be no user of it, but it would be partial
>> work. That includes the relcache support for keeping track of the
>> primary key which already is available separately.
>> Greetings,
>> Andres Freund

Attachment Content-Type Size
slony_logical.c text/x-csrc 11.5 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2013-01-28 04:15:37 Re: allowing privileges on untrusted languages
Previous Message Noah Misch 2013-01-28 03:34:08 Re: Re: Doc patch making firm recommendation for setting the value of commit_delay