Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4?

From: Marco Boeringa <marco(at)boeringa(dot)demon(dot)nl>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4?
Date: 2022-07-27 07:53:05
Message-ID: 6555d816-1e47-b5fe-84eb-656810e3e28d@boeringa.demon.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Now its getting interesting. I had another, different failure mode now.
The server closed unexpectedly, but without my "file system root" being
entirely clogged like last time, and PostgreSQL in fact successfully
recovering after the failure.

I had another look at my system and opened the 'postgresql-14-main.log',
see the log below. Please note I did not attempt to manually stop an
autovacuum session, so that entry must be some automatic generated
message whenever PostgreSQL is unable to start autovacuum? Anyway, the
interesting entry is the third one, that has the PANIC with a reference
to the 'visibilitymap' that Tomas also referred to in relation to SELECT
COUNT(*), so that at least ties this second, different failure,
potentially to the previous issue (although it may still be an unrelated
issue). Now to be clear about this, as can also be seen from the EXECUTE
statements, there is no SELECT COUNT(*) going on here (the referenced
'arc_update' prepared statement doesn't use it)

Any remarks related to this log?

Marco

2022-07-27 02:59:56.892 CEST [281070] ERROR:  canceling autovacuum task
2022-07-27 02:59:56.892 CEST [281070] CONTEXT:  while scanning block
46845 of relation "pg_toast.pg_toast_359621860"  automatic vacuum of
table "gis.pg_toast.pg_toast_359621860"

2022-07-27 03:02:49.721 CEST [281124] osm(at)gis PANIC:  wrong buffer
passed to visibilitymap_clear
2022-07-27 03:02:49.721 CEST [281124] osm(at)gis STATEMENT:  EXECUTE
arc_update(11995223);EXECUTE arc_update(177815656);EXECUTE
arc_update(443174623);EXECUTE arc_update(443213826);EXECUTE
arc_update(12673263);EXECUTE arc_update(550352967);EXECUTE
arc_update(11551029);EXECUTE arc_update(153847654);EXECUTE
arc_update(329349);EXECUTE arc_update(237168019);EXECUTE
arc_update(11552629);EXECUTE arc_update(381319471);EXECUTE
arc_update(171879263);EXECUTE arc_update(11555372);EXECUTE
arc_update(385793425);EXECUTE arc_update(362413550);EXECUTE
arc_update(363999384);EXECUTE arc_update(12622220);EXECUTE
arc_update(529712698);EXECUTE arc_update(357812660);EXECUTE
arc_update(145882199);EXECUTE arc_update(499576651);EXECUTE
arc_update(123173842);EXECUTE arc_update(345906810);EXECUTE
arc_update(11558506);EXECUTE arc_update(602477052);EXECUTE
arc_update(362390402);EXECUTE arc_update(488370921);EXECUTE
arc_update(148592092);EXECUTE arc_update(590614967);EXECUTE
arc_update(474316584);EXECUTE arc_update(598052676);EXECUTE
arc_update(345447488);EXECUTE arc_update(488880839);EXECUTE
arc_update(323713375);EXECUTE arc_update(362377910);EXECUTE
arc_update(306066948);EXECUTE arc_update(688704);EXECUTE
arc_update(600560187);EXECUTE arc_update(213569619);EXECUTE
arc_update(180294245);EXECUTE arc_update(343433621);EXECUTE
arc_update(199994366);EXECUTE arc_update(314610973);EXECUTE
arc_update(353722823);EXECUTE arc_update(222912592);EXECUTE
arc_update(305662591);EXECUTE arc_update(361771102);EXECUTE
arc_update(472591085);EXECUTE arc_update(146245966);
2022-07-27 04:07:14.422 CEST [1563] LOG:  server process (PID 281124)
was terminated by signal 6: Aborted
2022-07-27 04:07:14.422 CEST [1563] DETAIL:  Failed process was running:
EXECUTE arc_update(11995223);EXECUTE arc_update(177815656);EXECUTE
arc_update(443174623);EXECUTE arc_update(443213826);EXECUTE
arc_update(12673263);EXECUTE arc_update(550352967);EXECUTE
arc_update(11551029);EXECUTE arc_update(153847654);EXECUTE
arc_update(329349);EXECUTE arc_update(237168019);EXECUTE
arc_update(11552629);EXECUTE arc_update(381319471);EXECUTE
arc_update(171879263);EXECUTE arc_update(11555372);EXECUTE
arc_update(385793425);EXECUTE arc_update(362413550);EXECUTE
arc_update(363999384);EXECUTE arc_update(12622220);EXECUTE
arc_update(529712698);EXECUTE arc_update(357812660);EXECUTE
arc_update(145882199);EXECUTE arc_update(499576651);EXECUTE
arc_update(123173842);EXECUTE arc_update(345906810);EXECUTE
arc_update(11558506);EXECUTE arc_update(602477052);EXECUTE
arc_update(362390402);EXECUTE arc_update(488370921);EXECUTE
arc_update(148592092);EXECUTE arc_update(590614967);EXECUTE
arc_update(474316584);EXECUTE arc_update(598052676);EXECUTE
arc_update(345447488);EXECUTE arc_update(488880839);EXECUTE arc_u
2022-07-27 04:07:14.422 CEST [1563] LOG:  terminating any other active
server processes
2022-07-27 04:07:18.978 CEST [1563] LOG:  all server processes
terminated; reinitializing
2022-07-27 04:07:34.563 CEST [281625] LOG:  database system was
interrupted; last known up at 2022-07-27 03:02:59 CEST
2022-07-27 04:07:39.646 CEST [281625] LOG:  database system was not
properly shut down; automatic recovery in progress
2022-07-27 04:07:39.656 CEST [281625] LOG:  redo starts at 2595/7FB578D8
2022-07-27 04:09:43.329 CEST [281625] LOG:  invalid record length at
259B/9375C7A0: wanted 24, got 0
2022-07-27 04:09:43.329 CEST [281625] LOG:  redo done at 259B/9375C6E0
system usage: CPU: user: 71.84 s, system: 36.82 s, elapsed: 123.67 s
2022-07-27 04:10:23.425 CEST [1563] LOG:  database system is ready to
accept connections

Op 25-7-2022 om 15:39 schreef Tomas Vondra:
>
> On 7/25/22 08:04, Marco Boeringa wrote:
>> To extend on this, two interesting questions that come to mind are:
>>
>> - Does running SELECT COUNT(*) create WAL?
>>
> Yes. An obvious example is updating the visibility map (which is always
> logged to WAL) or hint bits (which may be WAL logged). I'd also bet we
> may generate WAL for indexes, e.g. to kill deleted tuples.
>
>> - Is it potentially conceivable that there is a kind of cross-database
>> vulnerability *within one and the same PostgreSQL cluster*, where an
>> issue in one database causes the WAL in another database to no longer
>> successfully be written to disk during checkpoints? I have never seen
>> processing errors where PostgreSQL emitted true PostgreSQL errors with
>> error numbers cause issues like that and affect a second database in the
>> same cluster, but since no error is generated here, and there might be
>> some uncatched error, I wonder?
>>
>> I am especially asking the second question since, although I wrote there
>> is no edit activity going on potentially generating WAL in the affected
>> small database, which is true, there *was* processing on Planet sized
>> data going on in a second database in the same cluster. That certainly
>> *is* capable of generating 890GB of WAL if nothing is cleaned up during
>> checkpoints due to checkpoints failing.
>>
> WAL is a resource shared by all the databases in the cluster, so if that
> gets broken it's broken for everyone.
>
>
> regards
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Yugo NAGATA 2022-07-28 01:51:34 Re: BUG #17434: CREATE/DROP DATABASE can be executed in the same transaction with other commands
Previous Message PG Bug reporting form 2022-07-27 07:32:46 BUG #17559: Inconsistent visibility in trigger function