Re: Single transaction in the tablesync worker?

From: Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Single transaction in the tablesync worker?
Date: 2021-02-05 15:10:36
Message-ID: e2d6c854-61ac-2720-37bd-d463a8fc1210@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

We had a bit high-level discussion about this patches with Amit
off-list, so I decided to also take a look at the actual code.

My main concern originally was the potential for left-over slots on
publisher, but I think the state now is relatively okay, with couple of
corner cases that are documented and don't seem much worse than the main
slot.

I wonder if we should mention the max_slot_wal_keep_size GUC in the
table sync docs though.

Another thing that might need documentation is that the the visibility
of changes done by table sync is not anymore isolated in that table
contents will show intermediate progress to other backends, rather than
switching from nothing to state consistent with rest of replication.

Some minor comments about code:

> + else if (res->status == WALRCV_ERROR && missing_ok)
> + {
> + /* WARNING. Error, but missing_ok = true. */
> + ereport(WARNING,

I wonder if we need to add error code to the WalRcvExecResult and check
for the appropriate ones here. Because this can for example return error
because of timeout, not because slot is missing. Not sure if it matters
for current callers though (but then maybe don't call the param
missign_ok?).

> +ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char syncslotname[NAMEDATALEN])
> +{
> + if (syncslotname)
> + sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
> + else
> + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
> +
> + return syncslotname;
> +}

Given that we are now explicitly dropping slots, what happens here if we
have 2 different downstreams that happen to get same suboid and reloid,
will one of the drop the slot of the other one? Previously with the
cleanup being left to temp slot we'd at maximum got error when creating
it but with the new logic in LogicalRepSyncTableStart it feels like we
could get into situation where 2 downstreams are fighting over slot no?

--
Petr

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Rofail 2021-02-05 15:27:21 Re: [HACKERS] GSoC 2017: Foreign Key Arrays
Previous Message Stephen Frost 2021-02-05 15:05:21 Re: [HACKERS] GSoC 2017: Foreign Key Arrays