Re: BUG #14941: Vacuum crashes

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Lyes Ameddah <lyes(dot)amd(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14941: Vacuum crashes
Date: 2018-03-07 05:03:45
Message-ID: 20180307050345.GA3095@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Mon, Mar 05, 2018 at 09:55:13PM +0000, Bossart, Nathan wrote:
> On 3/3/18, 6:13 PM, "Andres Freund" <andres(at)anarazel(dot)de> wrote:
>> I was working on committing 0002 and 0003, when I noticed that the
>> second patch doesn't actually fully works. NOWAIT does what it says on
>> the tin iff the table is locked with a lower lock level than access
>> exclusive. But if AEL is used, the command is blocked in
>>
>> static List *
>> expand_vacuum_rel(VacuumRelation *vrel)
>> ...
>> /*
>> * We transiently take AccessShareLock to protect the syscache lookup
>> * below, as well as find_all_inheritors's expectation that the caller
>> * holds some lock on the starting relation.
>> */
>> relid = RangeVarGetRelid(vrel->relation, AccessShareLock, false);
>>
>> ISTM has been added after the patches initially were proposed. See
>> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=19de0ab23ccba12567c18640f00b49f01471018d
>>
>> I'm a bit disappointed that the tests didn't catch this, they're clearly
>> not fully there. They definitely should test the AEL case, as
>> demonstrated here.
>
> Sorry about that. I've added logic to handle ACCESS EXCLUSIVE in v6 of 0002,
> and I've extended the tests to cover that case.

Hm, yes. I have also let those patches rot a bit myself. Sorry for
that.

>> Independent of that, I'm also concerned that NOWAIT isn't implemented
>> consistently with other commands. Aren't we erroring out in other uses
>> of NOWAIT? ISTM a more appropriate keyword would have been SKIP
>> LOCKED. I think the behaviour makes sense, but I'd rename the internal
>> flag and the grammar to use SKIP LOCKED.
>
> Agreed. I've updated 0002 to use SKIP LOCKED instead of NOWAIT.

So, NOWAIT means "please try to take a lock, don't wait for it and issue
an error immediately if the lock cannot be taken". SKIP_LOCKED means
"please try to take a lock, don't wait for it, but do not issue an error
if the lock cannot be taken and move on to next steps". I agree that
this is more consistent.

+ if (!(options & VACOPT_SKIP_LOCKED))
+ relid = RangeVarGetRelid(vrel->relation, AccessShareLock,
false);
+ else
+ {
+ relid = RangeVarGetRelid(vrel->relation, NoLock, false);
Yeah, I agree with Andres that getting all this logic done in
RangeVarGetRelidExtended would be cleaner. Let's see where the other
thread leads us to:
https://www.postgresql.org/message-id/20180306005349.b65whmvj7z6hbe2y%40alap3.anarazel.de

+ /*
+ * We must downcase the statement type for log message
consistency
+ * between expand_vacuum_rel(), vacuum_rel(), and analyze_rel().
+ */
+ stmttype_lower = asc_tolower(stmttype, strlen(stmttype));
This blows up on multi-byte characters and may report incorrect relation
name if double quotes are used, no?

+ /*
+ * Since autovacuum workers supply OIDs when calling vacuum(), no
+ * autovacuum worker should reach this code, and we can make
+ * assumptions about the logging levels we should use in the
checks
+ * below.
+ */
+ Assert(!IsAutoVacuumWorkerProcess());
This is a good idea, should be a separate patch as other folks tend to
be confused with relid handling in expand_vacuum_rel().

+ Specifies that <command>VACUUM</command> should not wait for any
+ conflicting locks to be released: if a relation cannot be locked
+ immediately without waiting, the relation is skipped
Should this mention as well that no errors are raised, allowing the
process to move on with the next relations?
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dmitry Dolgov 2018-03-07 10:49:04 Re: BUG #14999: pg_rewind corrupts control file global/pg_control
Previous Message Michael Paquier 2018-03-07 01:46:24 Re: BUG #14999: pg_rewind corrupts control file global/pg_control

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-03-07 05:05:02 Re: Comment on top of RangeVarGetRelidExtended in namespace.c mentioning RangeVarGetRelid
Previous Message Stephen Frost 2018-03-07 04:37:15 Re: Comment on top of RangeVarGetRelidExtended in namespace.c mentioning RangeVarGetRelid