From: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: shared memory based stat collector (was: Sharing record typmods between backends) |
Date: | 2017-08-14 17:12:19 |
Message-ID: | 20170814171218.2z7r3x335fakcwa7@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Robert Haas wrote:
> Actually, now that you mention it, I think it *is* broken already, or
> more to the point, if you configure that value, the autovacuum
> launcher hangs up, because of the AutovacuumWorkItem stuff that Alvaro
> added. When I just tested it, the AV launcher somehow ended up
> waiting for AutovacuumLock and I had to SIGQUIT the server to shut it
> down. That's actually not really entirely the fault of
> dynamic_shared_memory_type = none, though, because the code in
> autovacuum.c does this:
>
> AutoVacuumDSA = dsa_create(AutovacuumLock->tranche);
> /* make sure it doesn't go away even if we do */
> dsa_pin(AutoVacuumDSA);
> dsa_pin_mapping(AutoVacuumDSA);
>
> Now, that's actually really broken because if dsa_create() throws an
> error of any kind, you're going to have already assigned the value to
> AutoVacuumDSA, but you will not have pinned the DSA or the DSA
> mapping. There's evidently some additional bug here because I'd sorta
> expect this code to just go into an infinite loop in this case,
> failing over and over trying to reattach the segment, but evidently
> something even worse happening - perhaps the ERROR isn't releasing
> AutovacuumLock.
Yeah, the problem that lwlocks aren't released is because the launcher
is not in a transaction at that point, so AbortCurrentTransaction()
doesn't release locks like it normally would. The simplest fix (in the
attached 0001 patch) is to add a LWLockReleaseAll() call to the jmp
block, though I wonder if there should be some other cleanup functions
called from there, or whether perhaps it'd be a better strategy to have
the launcher run in a transaction at all times.
The other problem is that there's no attempt to handle a failed DSA
creation/attachment. The second patch just adds a PG_TRY block that
sets a flag not to try the DSA calls again if the first one fails. It
throws a single ERROR line, then autovacuum continues without workitem
support.
I intend to give these patches further thought before pushing anything,
will update this thread no later than tomorrow 19:00 UTC-0300.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
0001-Release-lwlocks-in-autovacuum-launcher-error-handlin.patch | text/plain | 1.4 KB |
0002-Don-t-repeatedly-try-to-attach-to-shared-memory.patch | text/plain | 3.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2017-08-14 17:16:29 | Re: pl/perl extension fails on Windows |
Previous Message | Andres Freund | 2017-08-14 17:12:16 | Re: shared memory based stat collector (was: Sharing record typmods between backends) |