Re: shared memory based stat collector (was: Sharing record typmods between backends)

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: shared memory based stat collector (was: Sharing record typmods between backends)
Date: 2017-08-14 17:12:19
Message-ID: 20170814171218.2z7r3x335fakcwa7@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:

> Actually, now that you mention it, I think it *is* broken already, or
> more to the point, if you configure that value, the autovacuum
> launcher hangs up, because of the AutovacuumWorkItem stuff that Alvaro
> added. When I just tested it, the AV launcher somehow ended up
> waiting for AutovacuumLock and I had to SIGQUIT the server to shut it
> down. That's actually not really entirely the fault of
> dynamic_shared_memory_type = none, though, because the code in
> autovacuum.c does this:
>
> AutoVacuumDSA = dsa_create(AutovacuumLock->tranche);
> /* make sure it doesn't go away even if we do */
> dsa_pin(AutoVacuumDSA);
> dsa_pin_mapping(AutoVacuumDSA);
>
> Now, that's actually really broken because if dsa_create() throws an
> error of any kind, you're going to have already assigned the value to
> AutoVacuumDSA, but you will not have pinned the DSA or the DSA
> mapping. There's evidently some additional bug here because I'd sorta
> expect this code to just go into an infinite loop in this case,
> failing over and over trying to reattach the segment, but evidently
> something even worse happening - perhaps the ERROR isn't releasing
> AutovacuumLock.

Yeah, the problem that lwlocks aren't released is because the launcher
is not in a transaction at that point, so AbortCurrentTransaction()
doesn't release locks like it normally would. The simplest fix (in the
attached 0001 patch) is to add a LWLockReleaseAll() call to the jmp
block, though I wonder if there should be some other cleanup functions
called from there, or whether perhaps it'd be a better strategy to have
the launcher run in a transaction at all times.

The other problem is that there's no attempt to handle a failed DSA
creation/attachment. The second patch just adds a PG_TRY block that
sets a flag not to try the DSA calls again if the first one fails. It
throws a single ERROR line, then autovacuum continues without workitem
support.

I intend to give these patches further thought before pushing anything,
will update this thread no later than tomorrow 19:00 UTC-0300.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-Release-lwlocks-in-autovacuum-launcher-error-handlin.patch text/plain 1.4 KB
0002-Don-t-repeatedly-try-to-attach-to-shared-memory.patch text/plain 3.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-08-14 17:16:29 Re: pl/perl extension fails on Windows
Previous Message Andres Freund 2017-08-14 17:12:16 Re: shared memory based stat collector (was: Sharing record typmods between backends)