Re: hyrax vs. RelationBuildPartitionDesc

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: hyrax vs. RelationBuildPartitionDesc
Date: 2019-04-15 08:05:16
Message-ID: 5a72697b-0738-0ce5-2531-88febf776bff@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2019/04/15 2:38, Tom Lane wrote:
> Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> writes:
>> On 2019/04/10 15:42, Michael Paquier wrote:
>>> It seems to me that Tom's argument to push in the way relcache
>>> information is handled by copying its contents sounds sensible to me.
>>> That's not perfect, but it is consistent with what exists (note
>>> PublicationActions for a rather-still-not-much-complex example of
>>> structure which gets copied from the relcache).
>
>> I gave my vote for direct access of unchangeable relcache substructures
>> (TupleDesc, PartitionDesc, etc.), because they're accessed during the
>> planning of every user query and fairly expensive to copy compared to list
>> of indexes or PublicationActions that you're citing. To further my point
>> a bit, I wonder why PublicationActions needs to be copied out of relcache
>> at all? Looking at its usage in execReplication.c, it seems we can do
>> fine without copying, because we are holding both a lock and a relcache
>> reference on the replication target relation during the entirety of
>> apply_handle_insert(), which means that information can't change under us,
>> neither logically nor physically.
>
> So the point here is that that reasoning is faulty. You *cannot* assume,
> no matter how strong a lock or how many pins you hold, that a relcache
> entry will not get rebuilt underneath you. Cache flushes happen
> regardless. And unless relcache.c takes special measures to prevent it,
> a rebuild will result in moving subsidiary data structures and thereby
> breaking any pointers you may have pointing into those data structures.
>
> For certain subsidiary structures such as the relation tupdesc,
> we do take such special measures: that's what the "keep_xxx" dance in
> RelationClearRelation is. However, that's expensive, both in cycles
> and maintenance effort: it requires having code that can decide equality
> of the subsidiary data structures, which we might well have no other use
> for, and which we certainly don't have strong tests for correctness of.
> It's also very error-prone for callers, because there isn't any good way
> to cross-check that code using a long-lived pointer to a subsidiary
> structure is holding a lock that's strong enough to guarantee non-mutation
> of that structure, or even that relcache.c provides any such guarantee
> at all. (If our periodic attempts to reduce lock strength for assorted
> DDL operations don't scare the pants off you in this connection, you have
> not thought hard enough about it.) So I think that even though we've
> largely gotten away with this approach so far, it's also a half-baked
> kluge that we should be looking to get rid of, not extend to yet more
> cases.

Thanks for the explanation.

I understand that simply having a lock and a nonzero refcount on a
relation doesn't prevent someone else from changing it concurrently.

I get that we want to get rid of the keep_* kludge in the long term, but
is it wrong to think, for example, that having keep_partdesc today allows
us today to keep the pointer to rd_partdesc as long as we're holding the
relation open or refcnt on the whole relation such as with
PartitionDirectory mechanism?

> To my mind there are only two trustworthy solutions to the problem of
> wanting time-extended usage of a relcache subsidiary data structure: one
> is to copy it, and the other is to reference-count it. I think that going
> over to a reference-count-based approach for many of these structures
> might well be something we should do in future, maybe even the very near
> future. In the mean time, though, I'm not really satisfied with inserting
> half-baked kluges, especially not ones that are different from our other
> half-baked kluges for similar purposes. I think that's a path to creating
> hard-to-reproduce bugs.

+1 to reference-count-based approach.

I've occasionally wondered why there is both keep_tupdesc kludge and
refcounting for table TupleDescs. I guess it's because *only* the
TupleTableSlot mechanism in the executor uses TupleDesc pinning (that is,
refcounting) and the rest of the sites depend on the shaky guarantee
provided by keep_tupdesc.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2019-04-15 08:22:05 Re: Attempt to consolidate reading of XLOG page
Previous Message zedaardv 2019-04-15 07:57:43 Re: PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos