Re: Logically replicated table has no visible rows

From: Jeff Ross <jross(at)openvistas(dot)net>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Logically replicated table has no visible rows
Date: 2022-06-03 17:10:29
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 5/31/22 11:46 AM, Jeff Ross wrote:
> Hello,
> We have a logically replicated table on RDS that is 39 G in size on
> both the publisher (10.21) and the subscriber (12.8).
> The replication slots on the publisher are all marked as active and
> the lsns are current so no lag.
> Other tables on the subscriber side are also identical in size and
> have no problem with queries against them.
> We did a vacuum full on the subscriber table and the size dropped to a
> couple hundred MBs and is growing but still has no visible rows.
> Now on the publisher I see a temporary replication slot so I'm
> guessing that in reality replication had not finished and that's why
> the load_events table was full sized but had no visible rows?
> At this point I guess my questions are does my hunch that replication
> hadn't completely finished is the root cause of what we saw make sense?
> I've always used both the lsns and the lack of temporary replication
> slots to tell when logical replication had finished--is there a more
> authoritative way to do so?
> Thanks,
> Jeff

As a follow up for the archives...

It became apparent the longer we looked that logical replication in fact
had not finished and indeed never did finish.  On both an EC2 server and
an RDS server we were missing at least one table that by table size
appeared to be identical to the publisher's table and yet had no visible
rows.  On the publisher I would occasionally see temporary replication
slots but they would go away in a few minutes.

I then dropped the subscription on the RDS server and immediately
restarted it with (copy_data = False).  As expected we immediately saw
rows being inserted into that massive table but it still showed only
those rows.  I have a "backfill" process to sync up logically replicated
tables and that ran but very, very slowly.  Finally while syncing that
big table it stalled completely.  Network monitoring showed about
100Kb/s traffic to the RDS server.  That is not a typo.

Our client spun up another RDS server, imported all the settings from
the previous one and I setup the database and imported the schemas.  As
soon as we started replication we were seeing network transfers in the
45Mb/s range.  Replication finished with all tables intact in under 5
hours for 420G.

We changed nothing on our side so whatever was causing the glacial data
transfer was on AWS side.


In response to

Browse pgsql-general by date

  From Date Subject
Next Message Nathan Bossart 2022-06-03 18:17:24 Re: Extension pg_trgm, permissions and pg_dump order
Previous Message Philip Semanchuk 2022-06-03 15:11:14 Re: Max sane value for join_collapse_limit?