Re: Logical decoding & exported base snapshot

From: Steve Singer <steve(at)ssinger(dot)info>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Logical decoding & exported base snapshot
Date: 2012-12-12 03:39:14
Message-ID: BLU0-SMTP8417A16727A6855996B47DC4F0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12-12-11 06:52 PM, Andres Freund wrote:
> Hi,

>
> Problem 1:
>
> One problem I see is that while exporting a snapshot solves the
> visibility issues of the table's contents it does not protect against
> schema changes. I am not sure whether thats a problem.
>
> If somebody runs a CLUSTER or something like that, the table's contents
> will be preserved including MVCC semantics. That's fine.
> The more problematic cases I see are TRUNCATE, DROP and ALTER
> TABLE. Imagine the following:
>
> S1: INIT_LOGICAL_REPLICATION
> S1: get snapshot: 00000333-1
> S2: ALTER TABLE foo ALTER COLUMN blub text USING (blub::text);
> S3: pg_dump --snapshot 00000333-1
> S1: START_LOGICAL_REPLICATION
>
> In that case the pg_dump would dump foo using the schema *after* the
> ALTER TABLE but showing only rows visible to our snapshot. After
> START_LOGICAL_REPLICATION all changes after the xlog position from
> INIT_LOGICAL_REPLICATION will be returned though - including all the
> tuples from the ALTER TABLE and potentially - if some form of schema
> capturing was in place - the ALTER TABLE itself. The copied schema would
> have the new format already though.
>
> Does anybody see that as aproblem or is it just a case of PEBKAC? One
> argument for the latter is that thats already a problematic case for
> normal pg_dump's. Its just that the window is a bit larger here.

Is there anyway to detect this situation as part of the pg_dump? If I
could detect this, abort my pg_dump then go and get a new snapshot then
I don't see this as a big deal. I can live with telling users, "don't
do DDL like things while subscribing a new node, if you do the
subscription will restart". I am less keen on telling users "don't do
DDL like things while subscribing a new node or the results will be
unpredictable"

I'm trying to convince myself if I will be able to take a pg_dump from
an exported snapshot plus the changes made after in between the snapshot
id to some later time and turn the results into a consistent database.
What if something like this comes along

INIT REPLICATION
insert into foo (id,bar) values (1,2);
alter table foo drop column bar;
pg_dump --snapshot

The schema I get as part of the pg_dump won't have bar because it has
been dropped, even though it will have the rows that existed with bar.
I then go to process the INSERT statement. It will have a WAL record
with column data for bar and the logical replication replay will lookup
the catalog rows from before the alter table so it will generate a
logical INSERT record with BAR. That will fail on the replica.

I'm worried that it will be difficult to pragmatically stitch together
the inconsistent snapshot from the pg_dump plus the logical records
generated in between the snapshot and the dump (along with any record of
the DDL if it exists).

> Problem 2:
>
> Control Flow.
>
> To be able to do a "SET TRANSACTION SNAPSHOT" the source transaction
> needs to be alive. That's currently solved by exporting the snapshot in
> the walsender connection that did the INIT_LOGICAL_REPLICATION. The
> question is how long should we preserve that snapshot?
>
> There is no requirement - and there *cannot* be one in the general case,
> the slot needs to be usable after a restart - that
> START_LOGICAL_REPLICATION has to be run in the same replication
> connection as INIT.
>
> Possible solutions:
> 1) INIT_LOGICAL_REPLICATION waits for an answer from the client that
> confirms that logical replication initialization is finished. Before
> that the walsender connection cannot be used for anything else.
>
> 2) we remove the snapshot as soon as any other commend is received, this
> way the replication connection stays usable, e.g. to issue a
> START_LOGICAL_REPLICATION in parallel to the initial data dump. In that
> case the snapshot would have to be imported *before* the next command
> was received as SET TRANSACTION SNAPSHOT requires the source transaction
> to be still open.
>
> Opinions?

Option 2 sounds more flexible. Is it more difficult to implement?
> Andres
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Kupershmidt 2012-12-12 04:25:43 Re: Multiple --table options for other commands
Previous Message Tom Lane 2012-12-12 03:20:18 Re: Logical decoding & exported base snapshot