Re: Shaky coding for vacuuming partitioned relations

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Shaky coding for vacuuming partitioned relations
Date: 2017-09-25 07:42:22
Message-ID: c6f082e8-76e3-e726-e719-e59eef8cd557@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/09/25 12:10, Michael Paquier wrote:
> On Sat, Sep 23, 2017 at 4:13 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Somebody inserted this into vacuum.c's get_rel_oids():
>>
>> tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
>> if (!HeapTupleIsValid(tuple))
>> elog(ERROR, "cache lookup failed for relation %u", relid);
>>
>> apparently without having read the very verbose comment two lines above,
>> which points out that we're not taking any lock on the target relation.
>> So, if that relation is concurrently being dropped, you're likely to
>> get "cache lookup failed for relation NNNN" rather than anything more
>> user-friendly.
>
> This has been overlooked during the lookups of 3c3bb99, and by
> multiple people including me. elog() should never be things users can
> face, as well as cache lookups.

Agreed, that was an oversight.

>> A minimum-change fix would be to replace the elog() with an ereport
>> that produces the same "relation does not exist" error you'd have
>> gotten from RangeVarGetRelid, had the concurrent DROP TABLE committed
>> a few microseconds earlier. But that feels like its's band-aiding
>> around the problem.
>
> Yeah, that's not right. This is a cache lookup error at the end.

Also agreed that fixing the locking here would be a better solution.

>> What I'm wondering about is changing the RangeVarGetRelid call to take
>> ShareUpdateExclusiveLock rather than no lock. That would protect the
>> syscache lookup, and it would also make the find_all_inheritors call
>> a lot more meaningful.
>>
>> If we're doing a VACUUM, the ShareUpdateExclusiveLock would be dropped
>> as soon as we close the caller's transaction, and then we'd acquire
>> the same or stronger lock inside vacuum_rel(). So that seems fine.
>> If we're doing an ANALYZE, then the lock would continue to be held
>> and analyze_rel would merely be acquiring it an extra time, so we'd
>> actually be removing a race-condition failure scenario for ANALYZE.
>> This would mean a few more cycles in lock management, but since this
>> only applies to a manual VACUUM or ANALYZE that specifies a table
>> name, I'm not too concerned about that.
>
> I think that I am +1 on that. Testing such a thing I am not seeing
> anything wrong either. The call to find_all_inheritors should also use
> ShareUpdateExclusiveLock, and the lock on top of RangeVarGetRelid()
> needs to be reworked.
>
> Attached is a proposal of patch.

Hmm, I'm not sure if we need to lock the partitions, too. Locks taken by
find_all_inheritors() will be gone once the already-running transaction is
ended by the caller (vacuum()). get_rel_oids() should just lock the table
mentioned in the command (the parent table), so that find_all_inheritors()
returns the correct partition list, as Tom perhaps alluded to when he said
"it would also make the find_all_inheritors call a lot more meaningful.",
but...

When vacuum() iterates that list and calls vacuum_rel() for each relation
in the list, it might be the case that a relation in that list is no
longer a partition of the parent. So, one might say that it would get
vacuumed unnecessarily. Perhaps that's fine?

>> Thoughts?
>
> As long as I don't forget... Another thing currently on HEAD and
> REL_10_STABLE is that OIDs of partitioned tables are used, but the
> RangeVar of the parent is used for error reports. This leads to
> incorrect reports if a partition gets away in the middle of autovacuum
> as only information about the parent is reported to the user.

Oh, you're right. The original RangeVar (corresponding to the table
mentioned in the command) refers to the parent table.

> I am not
> saying that this needs to be fixed in REL_10_STABLE at this stage
> though as this would require some refactoring similar to what the
> patch adding support for VACUUM with multiple relations does.

I think it would be better to fix that independently somehow. VACUUM on
partitioned tables is handled by calling vacuum_rel() on individual
partitions. It would be nice if vacuum_rel() didn't have to refer to the
RangeVar. Could we not use get_rel_name(relid) in place of
relation->relname? I haven't seen the other patch yet though, so maybe
I'm missing something.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2017-09-25 08:26:21 Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?
Previous Message Haribabu Kommi 2017-09-25 07:37:56 Re: SERIALIZABLE with parallel query