Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-09-13 21:38:43
Message-ID: 904373.1600033123@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> Pushed.

Observe the following reports:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus&dt=2020-09-13%2016%3A54%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2020-09-10%2009%3A08%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis&dt=2020-09-05%2020%3A22%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2020-09-04%2001%3A52%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2020-09-03%2020%3A54%3A04

These are all on HEAD, and all within the last ten days, and I see
nothing comparable in any branch before that. So it's hard to avoid
the conclusion that somebody broke something about ten days ago.

None of these animals provided gdb backtraces; but we do have a built-in
trace from several, and they all look like pgoutput.so is trying to
list_free() garbage, somewhere inside a relcache invalidation/rebuild
scenario:

TRAP: FailedAssertion("list->length > 0", File: "/home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/../pgsql/src/backend/nodes/list.c", Line: 68)
postgres: publisher: walsender bf [local] idle(ExceptionalCondition+0x57)[0x9081f7]
postgres: publisher: walsender bf [local] idle[0x6bcc70]
postgres: publisher: walsender bf [local] idle(list_free+0x11)[0x6bdc01]
/home/bf/build/buildfarm-idiacanthus/HEAD/pgsql.build/tmp_install/home/bf/build/buildfarm-idiacanthus/HEAD/inst/lib/postgresql/pgoutput.so(+0x35d8)[0x7fa4c5a6f5d8]
postgres: publisher: walsender bf [local] idle(LocalExecuteInvalidationMessage+0x15b)[0x8f0cdb]
postgres: publisher: walsender bf [local] idle(ReceiveSharedInvalidMessages+0x4b)[0x7bca0b]
postgres: publisher: walsender bf [local] idle(LockRelationOid+0x56)[0x7c19e6]
postgres: publisher: walsender bf [local] idle(relation_open+0x1c)[0x4a2d0c]
postgres: publisher: walsender bf [local] idle(table_open+0x6)[0x524486]
postgres: publisher: walsender bf [local] idle[0x9017f2]
postgres: publisher: walsender bf [local] idle[0x8fabd4]
postgres: publisher: walsender bf [local] idle[0x8fa58a]
postgres: publisher: walsender bf [local] idle(RelationCacheInvalidateEntry+0xaf)[0x8fbdbf]
postgres: publisher: walsender bf [local] idle(LocalExecuteInvalidationMessage+0xec)[0x8f0c6c]
postgres: publisher: walsender bf [local] idle(ReceiveSharedInvalidMessages+0xcb)[0x7bca8b]
postgres: publisher: walsender bf [local] idle(LockRelationOid+0x56)[0x7c19e6]
postgres: publisher: walsender bf [local] idle(relation_open+0x1c)[0x4a2d0c]
postgres: publisher: walsender bf [local] idle(table_open+0x6)[0x524486]
postgres: publisher: walsender bf [local] idle[0x8ee8b0]

010_truncate.pl itself hasn't changed meaningfully in a good long time.
However, I see that 464824323 added a whole boatload of code to
pgoutput.c, and the timing is right for that commit to be the culprit,
so that's what I'm betting on.

Probably this requires a relcache inval at the wrong time;
although we have recent passes from CLOBBER_CACHE_ALWAYS animals,
so that can't be the whole triggering condition. I wonder whether
it is relevant that all of the complaining animals are JIT-enabled.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-09-13 22:42:03 Re: A micro-optimisation for walkdir()
Previous Message Brar Piening 2020-09-13 20:33:36 Re: Minor documentation error regarding streaming replication protocol