Re: logical decoding / rewrite map vs. maxAllocatedDescs

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical decoding / rewrite map vs. maxAllocatedDescs
Date: 2018-08-11 07:06:18
Message-ID: 20180811070618.eugufqgntp4ja7zm@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-08-11 01:55:43 +0200, Tomas Vondra wrote:
> On 08/10/2018 11:59 PM, Tomas Vondra wrote:
> >
> > ...
> >
> > I suspect there's some other ingredient, e.g. some manipulation with the
> > subscription. Or maybe it's not needed at all and I'm just imagining things.
> >
>
> Indeed, the manipulation with the subscription seems to be the key here.
> I pretty reliably get the 'could not read block' error when doing this:
>
> 1) start the insert pgbench
>
> pgbench -n -c 4 -T 300 -p 5433 -f insert.sql test
>
> 2) start the vacuum full pgbench
>
> pgbench -n -f vacuum.sql -T 300 -p 5433 test
>
> 3) try to create a subscription, but with small amount of conflicting
> data so that the sync fails like this:
>
> LOG: logical replication table synchronization worker for
> subscription "s", table "t" has started
> ERROR: duplicate key value violates unique constraint "t_pkey"
> DETAIL: Key (a)=(5997542) already exists.
> CONTEXT: COPY t, line 1
> LOG: worker process: logical replication worker for subscription
> 16458 sync 16397 (PID 31983) exited with exit code 1
>
> 4) At this point the insert pgbench (at least some clients) should have
> failed with the error. If not, rinse and repeat.
>
> This kinda explains why I've been seeing the error only occasionally,
> because it only happened when I forgotten to clean the table on the
> subscriber while recreating the subscription.

I'll try to reproduce this. If you're also looking, I suspect a good
first hint would be to just change the ERROR into a PANIC and look at
the backtrace from the generated core file.

To the point that I wonder if we shouldn't just change the ERROR into a
PANIC on master (but not REL_11_STABLE), so the buildfarm gives us
feedback. I don't think the problem can fundamentally be related to
subscriptions, given the error occurs before any subscriptions are
created in the schedule.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2018-08-11 07:15:25 Re: logical decoding / rewrite map vs. maxAllocatedDescs
Previous Message Pavel Stehule 2018-08-11 06:03:43 Re: csv format for psql