Re: Logical replication existing data copy

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical replication existing data copy
Date: 2017-01-18 13:46:56
Message-ID: 5a90e2ff-6375-2e30-d0af-f6d1f672429a@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/01/17 17:10, Peter Eisentraut wrote:
> On 12/19/16 4:30 AM, Petr Jelinek wrote:
>> This patch implements data synchronization for the logical replication.
>> It works both for initial setup of subscription as well as for tables
>> added later to replication when the subscription is already active.
>
> First detailed read-through. General structure makes sense.

Thanks!

>
> Comments on some details:
>
> No catalogs.sgml documentation for pg_subscription_rel. When that is
> added, I would emphasize that entries are only added when relations
> are first encountered, not immediately when a table is added to a
> publication. So it is unlike pg_publication_rel in that respect.
>

It's not even first encountered, but I did explain.

> Rename max_subscription_sync_workers ->
> max_sync_workers_per_subscription (similar to
> max_parallel_workers_per_gather and others)
>

Makes sense.

> About the changes to COPY: It took me a while to get clarity on the
> direction of things. Is the read_cb reading the table, or the data?
> You are copying data produced by a function into a table, so
> produce_cb or data_source_cb could be clearer.
>

The data_source_cb sounds good to me.

> SetSubscriptionRelState(): This is doing an "upsert", so I don't think
> a RowExclusiveLock is enough, or it needs to be able to retry on
> concurrent changes. I suppose there shouldn't be more than one
> concurrent change per sub/rel pair, but that would need to be
> explained there.
>

Well technically there can be some concurrency via the ALTER
SUBSCRIPTION ... REFRESH so I increased lock level (and same for Remove).

> SetSubscriptionRelState(): The memset(values, 0, sizeof(values)) is
> kind of in an odd place. Possibly not actually needed.
>

It might not be needed but we traditionally do it anyway. I moved it a bit.

> GetSubscriptionRelState(): Prefer error messages that identify the
> objects by name. (Also subid should be %u.)
>

Well this is cache lookup failure though, who's to know that the objects
actually exist in that case.

> GetSubscriptionRelationsFilter(): The state and filter arguments are
> kind of weird. And there are only two callers, so all this complexity
> is not even used. Perhaps, if state == SUBREL_STATE_UNKNOWN, then
> return everything, else return exactly the state specified. The case
> of everything-but-the-state-specified does not appear to be needed.
>

I see this was bit confusing so I split the function into two. Actually
the 2 usecases used are everything and everything except SUBREL_STATE_READY.

> GetSubscriptionRelationsFilter(): RowExclusiveLock is probably too
> much
>

Yes, same with GetSubscriptionRelState().

> AlterSubscription_refresh(): remote_table_oids isn't really the
> remote OIDs of the tables but the local OIDs of the remote tables.
> Consider clearer variable naming in that function.
>

Done.

> interesting_relation(): very generic name, maybe
> should_apply_changes_for_rel(). Also the comment mentions a "parallel
> worker" -- maybe better a "separate worker process running in
> parallel", since a parallel worker is something different.
>

Done.

>
> process_syncing_tables_*(): Function names and associated comments are
> not very clear (process what?, handle what?).
>

Ok added some more explanation, hopefully it's better now.

> In process_syncing_tables_apply(), it says that
> logicalrep_worker_count() counts the apply worker as well, but what
> happens if the apply worker has temporarily disappeared? Since

Then the function is never going to be called for that subscription as
it's only called from the apply.

> logicalrep_worker_count() is only used in this one place, maybe have
> it count just the sync workers and rename it slightly.
>

Makes sense anyway though.

> Changed some LOG messages to DEBUG, not clear what the purpose there is.

In fact I changed some DEBUG messages to LOG in the main patch set, git
rebase just does not handle that very well. Fixed.

> check_max_subscription_sync_workers(): Same problem as discussed
> previously, checking a GUC setting against another GUC setting like
> this doesn't work. Just skip it and check at run time.
>

Yeah, we always check at run time.

> wait_for_sync_status_change(): Comment is wrong/misleading: It doesn't
> wait until it matches any specific state, it just waits for any state
> change.
>

Fixed.

> LogicalRepSyncTableStart(): The code that assembles the slot name
> needs to be aware of NAMEDATALEN. (Maybe at least throw in a static
> assert that NAMEDATALEN>=64.)
>

I switched to snprintf seems cleaner (also removes the need to know
about proper memory context of a private variable from
LogicalRepSyncTableStart()).

I also added option called SKIP CONNECT to CREATE SUBSCRIPTION (yes that
WIP name, I welcome suggestions). That skips the initial connection
attempt which is now needed always since we need to copy the table list.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Logical-replication-support-for-initial-data-copy-v2.patch text/plain 147.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-01-18 14:18:24 Re: Parallel Index Scans
Previous Message Tom Lane 2017-01-18 13:43:24 Re: Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)