Re: row filtering for logical replication

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-07-12 02:15:04
Message-ID: CAJcOf-d70xg1O2jX1CrUeXaj+nMas3+NyJwYjbRsK6ZctH+x5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 12, 2021 at 9:31 AM Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> cfbot seems to be unhappy with v18 on some of the hosts. Cirrus/FreeBSD failed
> in the test 010_truncate. It also failed in a Cirrus/Linux box. I failed to
> reproduce in my local FreeBSD box. Since it passes appveyor and Cirrus/macos,
> it could probably be a transient issue.
>

I don't think it's a transient issue.
I also get a test failure in subscription/010_truncate.pl when I run
"make check-world" with the v18 patches applied.
The problem can be avoided with the following change (to match what
was originally in my v17-0005 performance-improvement patch):

diff --git a/src/backend/replication/pgoutput/pgoutput.c
b/src/backend/replication/pgoutput/pgoutput.c
index 08c018a300..800bae400b 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -1256,8 +1256,8 @@ get_rel_sync_entry(PGOutputData *data, Relation relation)
}

/* create a tuple table slot for row filter */
- tupdesc = RelationGetDescr(relation);
oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+ tupdesc = CreateTupleDescCopy(RelationGetDescr(relation));
entry->scantuple = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple);
MemoryContextSwitchTo(oldctx);

This creates a TupleDesc copy in CacheMemoryContext that is not
refcounted, so it side-steps the problem.
At this stage I am not sure why the original v18 patch code doesn't
work correctly for the TupleDesc refcounting here.
The TupleDesc refcount is zero when it's time to dealloc the tuple
slot (thus causing that Assert to fire), yet when the slot was
created, the TupleDesc refcount was incremented.- so it seems
something else has already decremented the refcount by the time it
comes to deallocate the slot. Perhaps there's an order-of-cleanup or
MemoryContext issue here or some buggy code somewhere, not sure yet.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-07-12 02:16:10 Re: bugfix: when the blocksize is 32k, the function page_header of pageinspect returns negative numbers.
Previous Message Greg Stark 2021-07-12 02:09:53 Re: Cosmic ray hits integerset