Appending data locally to a logical replication subscriber

From: andrew cooke <andrew(at)acooke(dot)org>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Appending data locally to a logical replication subscriber
Date: 2022-05-30 14:13:23
Message-ID: 20220530141323.y2saamlcv6y6h3zf@acooke.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Hi All,

I would appreciate some advice on adding data to logically replicated
tables on the subscriber. I am worried about contention between
writes from local data loading and replication.

We have 14 publisher databases (all with identical schema) that are
constantly receiving new data. The tables have a composite primary
key where one column identifies the publisher machine.

We also have a single subscriber that subscribes to all these
machines.

In addition to the realtime data described above, we would like to
back-load some archive data. For practical (ie political) reasons it
is easier to do this on the subscriber than on each publisher. The
archive data will use a composite primary key that combines the
publisher code (used to differentiate publishers) with an incremental
ID that will start at a "sufficiently large number" to avoid
conflicts.

The loading process is not a simple append; some data are normalized
in separate tables. So, for example, there may be a table with
orders, where each line is unique, but the orders reference companies
in another table, and usually these companies already exist.

My primary concern is that there is going to be contention between
replication and archive loading.

All tables on the subscriber may be queried by users (so disabling
indices is not trivial although maybe we could drop constraints and
load archive data at night).

Really, after all that, I guess I am asking the following:

- Is this just a bad idea? Is it implicit (or maybe explicit and
I've missed it) that the subscriber should be dedicated only to
reading subscriptions?

- Do subscriber transactions reflect publisher transactions? For
example, if the publisher receives many rows in a single
transaction (which can happen if we know/guess that the table is
only being written to by one process) does that mean that the
subscriber also writes those rows in a single transaction? Or is
it "simplified" to a transaction per row? Or something else (like
one transaction per "batch")?

- Is there anything else I should be concerned about?

Thanks,
Andrew

Browse pgsql-general by date

  From Date Subject
Next Message Fabien COELHO 2022-05-30 17:53:33 Re: psql 15beta1 does not print notices on the console until transaction completes
Previous Message Imre Samu 2022-05-30 10:50:14 Re: Is it possible to index "deep" into a JSONB column?