Re: logical changeset generation v4 - Heikki's thoughts about the patch state

From: Steve Singer <steve(at)ssinger(dot)info>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Phil Sorber <phil(at)omniti(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical changeset generation v4 - Heikki's thoughts about the patch state
Date: 2013-01-28 21:55:52
Message-ID: BLU0-SMTP33C9D847962E547879A4F8DC180@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13-01-28 06:17 AM, Andres Freund wrote:
> Hi,
>
> 3. Pass the delete (with no key values) onto the replication client and let
> it deal with it (see 1 and 2)
> Hm.
>
>
> While I agree that nicer behaviour would be good I think the real
> enforcement should happen on a higher level, e.g. with event triggers or
> somesuch. It seems way too late to do anything about it when we're
> already decoding. The transaction will already have committed...

Ideally the first line of enforcement would be with event triggers. The
thing with user-level mechanisms for enforcing things is that they
sometimes can be disabled or by-passed. I don't have a lot of sympathy
for people who do this but I like the idea of at least having the option
coding defensively to detect the situation and whine to the user.

>> How do you plan on dealing with sequences?
>> I don't see my plugin being called on sequence changes and I don't see
>> XLOG_SEQ_LOG listed in DecodeRecordIntoReorderBuffer. Is there a reason why
>> this can't be easily added?
> I basically was hoping for Simon's sequence-am to get in before doing
> anything real here. That didn't really happen yet.
> I am not sure whether there's a real usecase in decoding normal
> XLOG_SEQ_LOG records, their content isn't all that easy to interpet
> unless youre rather familiar with pg's innards.
>
> So, adding support wouldn't hard from a technical pov but it seems the
> semantics are a bit hard to nail down.
>
>> Also what do we want to do about TRUNCATE support. I could always leave a
>> TRUNCATE trigger in place that logged the truncate to a sl_truncates and
>> have my replication daemon respond to the insert on a sl_truncates table
>> by actually truncating the data on the replica.
> I have planned to add some generic "table_rewrite" handling, but I have
> to admit I haven't thought too much about it yet. Currently if somebody
> rewrites a table, e.g. with an ALTER ... ADD COLUMN .. DEFAULT .. or
> ALTER COLUMN ... USING ..., you will see INSERTs into a temporary
> table. That basically seems to be a good thing, but the user needs to be
> told about that ;)
>
>> I've spent some time this weekend updating my prototype plugin that
>> generates slony 2.2 style COPY output. I have attached my progress here
>> (also https://github.com/ssinger/slony1-engine/tree/logical_repl). I have
>> not gotten as far as modifying slon to act as a logical log receiver, or
>> made a version of the slony apply trigger that would process these
>> changes.
> I only gave it a quick look and have a couple of questions and
> remarks. The way you used the options it looks like youre thinking of
> specifying all the tables as options? I would have thought those would
> get stored & queried locally and only something like the 'replication
> set' name or such would be set as an option.

The way slony works today is that the list of tables to pull for a SYNC
comes from the subscriber because the subscriber might be behind the
provider, where a table has been removed from the set in the meantime.
The subscriber still needs to receive data from that table until it is
caught up to the point where that removal happens.

Having a time-travelled version of a user table (sl_table) might fix
that problem but I haven't yet figured out how that needs to work with
cascading (since that is a feature of slony today I can't ignore the
problem). I'm also not sure how that will work with table renames.
Today if the user renames a table inside of an EXECUTE SCRIPT slony will
update the name of the table in sl_table. This type of change wouldn't
be visible (yet) in the time-travelled catalog. There might be a
solution to this yet but I haven't figured out it. Sticking with what
slony does today seemed easier as a first step.

> Iterating over a list with
> for(i = 0; i < options->length; i= i + 2 )
> {
> DefElem * def_schema = (DefElem*) list_nth(options,i);
> is not a good idea btw, thats quadratic in complexity ;)

Thanks I'll rewrite this to walk a list of ListCell objects with next.

> In the REORDER_BUFFER_CHANGE_UPDATE I suggest using
> relation->rd_primary, just as in the DELETE case, that should always
> give you a consistent candidate key in an efficient manner.
>
>> I haven't looked into the details of what is involved in setting up a
>> subscription with the snapshot exporting.
> That hopefully shouldn't be too hard... At least thats the idea :P
>
>> I couldn't get the options on the START REPLICATION command to parse so I
>> just hard coded some list building code in the init method. I do plan on
>> pasing the list of tables to replicate from the replica to the plugin
>> (because this list comes from the replica). Passing what could be a few
>> thousand table names as a list of arguments is a bit ugly and I admit my
>> list processing code is rough. Does this make us want to reconsider the
>> format of the option_list ?
> Yea, something's screwed up there, sorry. Will push a fix later today.
>
>> I guess should provide an opinion on if I think that the patch in this CF,
>> if committed could be used to act as a source for slony instead of the log
>> trigger.
>> The biggest missing piece I mentioned in my email yesterday, that we aren't
>> logging the old primary key on row UPDATEs. I don't see building a credible
>> replication system where you don't allow users to update any column of a
>> row.
> Ok, I really thought this wouldn't be that much of an issue in a first
> version, but if you think its important, I'll add support for
> it. Shouldn't be too hard.

If your using non-surragate /natural primary keys this tends to come up
occasionally due to data-entry errors or renames. I'm looking at this
from the point of view of what do I need to use this as a source for a
production replication system with fewer sharp-edges compared to trigger
source slony. My standard is a bit higher than 'first' version because
I intent to use it in the version 3.0 of slony not 1.0. If others feel
I'm asking for too much they should speak up, maybe I am. Also the way
things will fail if someone were to try and update a primary key value
is pretty nasty (it will leave them with inconsistent databases). We
could install UPDATE triggers to try and detect this type of thing but
I'd rather see us just log the old values so we can use them during replay.

>> The other issues I've raised (DecodeDelete hiding bad deletes, replication
>> options not parsing for me) look like easy fixes
>>
>> no wal decoding support for sequences or truncate are things that I could
>> work around by doing things much like slony does today. The SYNC can still
>> capture the sequence changes in a table (where the INSERT's would be
>> logged) and I can have a trigger capture truncates.
> Could you explan a bit what's being done there in slony?

Each time the slon connects to the local database to create a SYNC
event, which is when slony captures snapshot visiblity information, it
also gets also looks at all of the replicated sequences and finds any
that have changed since the last sync The values sequence values as of
the last SYNC are stored in memory. Any sequences that have changed get
there new values written to the table sl_seqlog. When slon applies row
updates for a SYNC it also updates (setval) on any sequences that have
changed.

For truncates the truncate trigger just logs a single row into sl_log
indicating that the table has been truncated. When slon encounters a
row of operation 'TRUNCATE' it executes a TRUNCATE ONLY on the table.

>> If this patch is going to get bumped to 9.4 I really hope that someone with
>> good knowledge of the internals (ie a committer) can give this patch a good
>> review sooner rather than later. If there are issues Andres has overlooked
>> that are more serious or complicated to fix I would like to see them raised
>> before the next CF in June.
> Absolutely seconded. I *really* would love to see a more technical
> review, its hard to see issues after spending that much time in a
> certain worldview...
>
> Thanks!
>
> Andres
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-01-28 21:56:16 Re: enhanced error fields
Previous Message Peter Eisentraut 2013-01-28 21:33:50 Re: enhanced error fields