Re: Berserk Autovacuum (let's save next Mandrill)

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Andres Freund <andres(at)anarazel(dot)de>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Banck <mbanck(at)gmx(dot)net>
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Date: 2020-03-31 20:23:35
Message-ID: CAEZATCU0zHMDiQV0g8P2U+YSP9C1idUPrn79DajsbonwkN0xvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 31 Mar 2020 at 04:39, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> On Sat, 28 Mar 2020 at 22:22, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > I'm unsure yet if this has caused an instability on lousyjack's run in
> > [1].
>
> pogona has just joined in on the fun [1], so, we're not out the woods
> on this yet. I'll start having a look at this in more detail.
>
> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2023%3A10%3A03
>

I had a go at reproducing this. I wasn't able to produce the reported
failure, but I can reliably produce an Assert failure that may be
related by doing a VACUUM FULL simultaneously with an ANALYZE that is
generating extended stats, which produces:

#0 0x00007f28081c9520 in raise () from /lib64/libc.so.6
#1 0x00007f28081cab01 in abort () from /lib64/libc.so.6
#2 0x0000000000aad1ad in ExceptionalCondition (conditionName=0xb2f1a1
"ItemIdIsNormal(lp)", errorType=0xb2e7c9 "FailedAssertion",
fileName=0xb2e848 "heapam.c", lineNumber=3016) at assert.c:67
#3 0x00000000004fb79e in heap_update (relation=0x7f27feebeda8,
otid=0x2d881fc, newtup=0x2d881f8, cid=0, crosscheck=0x0, wait=true,
tmfd=0x7ffc568a5900, lockmode=0x7ffc568a58fc) at heapam.c:3016
#4 0x00000000004fdead in simple_heap_update (relation=0x7f27feebeda8,
otid=0x2d881fc, tup=0x2d881f8) at heapam.c:3902
#5 0x00000000005be860 in CatalogTupleUpdate (heapRel=0x7f27feebeda8,
otid=0x2d881fc, tup=0x2d881f8) at indexing.c:230
#6 0x00000000008df898 in statext_store (statOid=18964, ndistinct=0x0,
dependencies=0x2a85fe0, mcv=0x0, stats=0x2a86570) at
extended_stats.c:553
#7 0x00000000008deec0 in BuildRelationExtStatistics
(onerel=0x7f27feed9008, totalrows=5000, numrows=5000, rows=0x2ad5a30,
natts=7, vacattrstats=0x2a75f40) at extended_stats.c:187
#8 0x000000000065c1b7 in do_analyze_rel (onerel=0x7f27feed9008,
params=0x7ffc568a5fc0, va_cols=0x0, acquirefunc=0x65ce37
<acquire_sample_rows>, relpages=31, inh=false, in_outer_xact=false,
elevel=13) at analyze.c:606
#9 0x000000000065b532 in analyze_rel (relid=18956,
relation=0x29b0bc0, params=0x7ffc568a5fc0, va_cols=0x0,
in_outer_xact=false, bstrategy=0x2a7dfa0) at analyze.c:263
#10 0x00000000006fd768 in vacuum (relations=0x2a7e148,
params=0x7ffc568a5fc0, bstrategy=0x2a7dfa0, isTopLevel=true) at
vacuum.c:468
#11 0x00000000006fd22c in ExecVacuum (pstate=0x2a57a00,
vacstmt=0x29b0ca8, isTopLevel=true) at vacuum.c:251

It looks to me as though the problem is that statext_store() needs to
take its lock on pg_statistic_ext_data *before* searching for the
stats tuple to update.

It's late here, so I haven't worked up a patch yet, but it looks
pretty straightforward.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-03-31 20:32:07 Re: Small docs bugfix: make it clear what can be used in UPDATE FROM and DELETE USING
Previous Message Justin Pryzby 2020-03-31 20:08:12 Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)