Re: Logical decoding & exported base snapshot

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Steve Singer <steve(at)ssinger(dot)info>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Logical decoding & exported base snapshot
Date: 2012-12-12 11:20:11
Message-ID: 20121212112011.GC8027@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2012-12-11 22:39:14 -0500, Steve Singer wrote:
> On 12-12-11 06:52 PM, Andres Freund wrote:
> >Hi,
>
> >
> >Problem 1:
> >
> >One problem I see is that while exporting a snapshot solves the
> >visibility issues of the table's contents it does not protect against
> >schema changes. I am not sure whether thats a problem.
> >
> >If somebody runs a CLUSTER or something like that, the table's contents
> >will be preserved including MVCC semantics. That's fine.
> >The more problematic cases I see are TRUNCATE, DROP and ALTER
> >TABLE. Imagine the following:
> >
> >S1: INIT_LOGICAL_REPLICATION
> >S1: get snapshot: 00000333-1
> >S2: ALTER TABLE foo ALTER COLUMN blub text USING (blub::text);
> >S3: pg_dump --snapshot 00000333-1
> >S1: START_LOGICAL_REPLICATION
> >
> >In that case the pg_dump would dump foo using the schema *after* the
> >ALTER TABLE but showing only rows visible to our snapshot. After
> >START_LOGICAL_REPLICATION all changes after the xlog position from
> >INIT_LOGICAL_REPLICATION will be returned though - including all the
> >tuples from the ALTER TABLE and potentially - if some form of schema
> >capturing was in place - the ALTER TABLE itself. The copied schema would
> >have the new format already though.
> >
> >Does anybody see that as aproblem or is it just a case of PEBKAC? One
> >argument for the latter is that thats already a problematic case for
> >normal pg_dump's. Its just that the window is a bit larger here.
>
> Is there anyway to detect this situation as part of the pg_dump? If I could
> detect this, abort my pg_dump then go and get a new snapshot then I don't
> see this as a big deal. I can live with telling users, "don't do DDL like
> things while subscribing a new node, if you do the subscription will
> restart". I am less keen on telling users "don't do DDL like things while
> subscribing a new node or the results will be unpredictable"

I am trying to think of unintrusive way to detect this....

> I'm trying to convince myself if I will be able to take a pg_dump from an
> exported snapshot plus the changes made after in between the snapshot id to
> some later time and turn the results into a consistent database. What if
> something like this comes along
>
> ...
>
> I'm worried that it will be difficult to pragmatically stitch together the
> inconsistent snapshot from the pg_dump plus the logical records generated in
> between the snapshot and the dump (along with any record of the DDL if it
> exists).

I think trying to solve that in the replication solution is a bad
idea. There are too many possible scenarios and some of them very subtle
and hard to detect.

So I think its either:
1) find something that tests for this and abort if so
2) acquire locks earlier preventing DDL alltogether till pg_dump starts
3) don't care. The problem exists today and not many people have
complained.

> >Problem 2:
> >
> >Control Flow.
> >
> >To be able to do a "SET TRANSACTION SNAPSHOT" the source transaction
> >needs to be alive. That's currently solved by exporting the snapshot in
> >the walsender connection that did the INIT_LOGICAL_REPLICATION. The
> >question is how long should we preserve that snapshot?
> >
> >There is no requirement - and there *cannot* be one in the general case,
> >the slot needs to be usable after a restart - that
> >START_LOGICAL_REPLICATION has to be run in the same replication
> >connection as INIT.
> >
> >Possible solutions:
> >1) INIT_LOGICAL_REPLICATION waits for an answer from the client that
> >confirms that logical replication initialization is finished. Before
> >that the walsender connection cannot be used for anything else.
> >
> >2) we remove the snapshot as soon as any other commend is received, this
> >way the replication connection stays usable, e.g. to issue a
> >START_LOGICAL_REPLICATION in parallel to the initial data dump. In that
> >case the snapshot would have to be imported *before* the next command
> >was received as SET TRANSACTION SNAPSHOT requires the source transaction
> >to be still open.

> Option 2 sounds more flexible. Is it more difficult to implement?

No, I don't think so. It's a bit more intrusive in that it requires
knowledge about logical replication in more parts of walsender, but it
should be ok.

Note btw, that my description of 1) was easy to misunderstand. The
"that" in "Before that the walsender connection cannot be used for
anything else." is the answer from the client, not the usage of the
exported snapshot. Once the snapshot has been iimported into other
session(s) the source doesn't need to be alive anymore.
Does that explanation change anything?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-12-12 11:27:11 Re: bulk_multi_insert infinite loops with large rows and small fill factors
Previous Message Andres Freund 2012-12-12 11:13:44 Re: Logical decoding & exported base snapshot