Re: why doesn't DestroyPartitionDirectory hash_destroy?

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why doesn't DestroyPartitionDirectory hash_destroy?
Date: 2019-03-18 02:21:57
Message-ID: f8634ded-43d2-c843-d48b-9720883cfa7b@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2019/03/15 2:56, Robert Haas wrote:
> On Thu, Mar 14, 2019 at 1:16 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Actually, now that I've absorbed a bit more about 898e5e329,
>> I don't like very much about it at all. I think having it
>> try to hang onto pointers into the relcache is a completely
>> wrongheaded design decision, and the right way for it to work
>> is to just copy the PartitionDescs out of the relcache so that
>> they're fully owned by the PartitionDirectory. I don't see
>> a CopyPartitionDesc function anywhere (maybe it's named something
>> else?) but it doesn't look like it'd be hard to build; most
>> of the work is in partition_bounds_copy() which does exist already.
>
> Yeah, we could do that. I have to admit that I don't necessarily
> understand why trying to hang onto pointers into the relcache is a bad
> idea. It is a bit complicated, but the savings in both memory and CPU
> time seem worth pursuing. There are a lot of users who wish we scaled
> to a million partitions rather than a hundred, and just copying
> everything all over the place all the time won't get us closer to that
> goal.
>
> More generally, I think get_relation_info() is becoming an
> increasingly nasty piece of work. It copies more and more stuff
> instead of just pointing to it, which is necessary mostly because it
> closes the table instead of arranging to do that at the end of query
> planning. If it did the opposite, the refcount held by the planner
> would make it unnecessary for the PartitionDirectory to hold one, and
> I bet we could also just point to a bunch of the other stuff in this
> function rather than copying that stuff, too. As time goes by,
> relcache entries are getting more and more complex, and the optimizer
> wants to use more and more data from them for planning purposes, but,
> probably partly because of inertia, we're clinging to an old design
> where everything has to be copied. Every time someone gets that
> wrong, and it's happened a number of times, we yell at them and tell
> them to copy more stuff instead of thinking up a design where stuff
> doesn't need to be copied. I think that's a mistake. You have
> previously disagreed with that position so you probably will now, too,
> but I still think it.

+1.

>> Also, at least so far as the planner's usage is concerned, claiming
>> that we're saving something by not copying is completely bogus,
>> because if we look into set_relation_partition_info, what do we
>> find but a partition_bounds_copy call. That wouldn't be necessary
>> if we could rely on the PartitionDirectory to own the data structure.
>> (Maybe it's not necessary today. But given what a house of cards
>> this is, I wouldn't propose ripping it out, just moving it into
>> the PartitionDirectory code.)
>
> Ugh, I didn't notice the partition_bounds_copy() call in that
> function. Oops. Given the foregoing griping, you won't be surprised
> to hear that I'd rather just remove the copying step. However, it
> sounds like we can do that no matter whether we stick with my design
> or switch to yours, because PartitionDirectory holds a relcache refcnt
> then the pointer will be stable, and if we deep-copy the whole data
> structure then the pointer will also be stable. Prior to the commit
> at issue, we weren't doing either of those things, so that copy was
> needed until very recently.

One of the patches I've proposed in the "speed up planning with
partitions" thread [1] gets rid of the partition_bounds_copy() call,
because: 1) I too think it's unnecessary as the PartitionBoundInfo won't
change (logically or physically) as long as we've got some lock on the
table, which we do, 2) I've seen it become a significant bottleneck as the
number of partitions crosses thousands.

Thanks,
Amit

[1] https://commitfest.postgresql.org/22/1778/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-03-18 02:28:51 Re: WIP: Avoid creation of the free space map for small tables
Previous Message Amit Langote 2019-03-18 02:01:51 Re: why doesn't DestroyPartitionDirectory hash_destroy?