Some pgq table rewrite incompatibility with logical decoding?

From: Jeremy Finzel <finzelj(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Some pgq table rewrite incompatibility with logical decoding?
Date: 2018-06-25 15:37:18
Message-ID: CAMa1XUjkTsrtmJxdwJBw9UBdxqYYz2pTxbwyaK0HwjQ9iLjefA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I am hoping someone here can shed some light on this issue - I apologize if
this isn't the right place to ask this but I'm almost some of you all were
involving in pgq's dev and might be able to answer this.

We are actually running 2 replication technologies on a few of our dbs,
skytools and pglogical. Although we are moving towards only using logical
decoding-based replication, right now we have both for different purposes.

There seems to be a table rewrite happening on table pgq.event_58_1 that
has happened twice, and it ends up in the decoding stream, resulting in the
following error:

ERROR,XX000,"could not map filenode ""base/16418/1173394526"" to relation
OID"

In retracing what happened, we discovered that this relfilenode was
rewritten. But somehow, it is ending up in the logical decoding stream as
is "undecodable". This is pretty disastrous because the only way to fix it
really is to advance the replication slot and lose data.

The only obvious table rewrite I can find in the pgq codebase is a truncate
in pgq.maint_rotate_tables.sql. But there isn't anything surprising
there. If anyone has any ideas as to what might cause this so that we
could somehow mitigate the possibility of this happening again until we
move off pgq, that would be much appreciated.

Thanks,
Jeremy

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Lætitia Avrot 2018-06-25 15:45:37 Re: Constraint documentation
Previous Message Alexander Kuzmenkov 2018-06-25 15:26:28 Re: Removing unneeded self joins