Re: crash with sql language partition support function

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: crash with sql language partition support function
Date: 2018-04-12 10:06:44
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for the comments.

On 2018/04/11 0:27, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
>> Ashutosh Bapat wrote:
>>> On Tue, Apr 10, 2018 at 1:44 PM, Amit Langote
>>> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>> Attached fixes it. It teaches RelationBuildPartitionKey() to use
>>>> fmgr_info_cxt and pass rd_partkeycxt to it.
>>> The patch is using partkeycxt and not rd_partkeycxt. Probably a typo
>>> in the mail.
> No, it's correct as written, because rd_partkeycxt hasn't been set yet.

Yeah. partkeycxt will be assigned to rd_partkecxt later in the function,
but I guess Ashutosh seems to already be aware of that, given this discussion.

>> Good point. Yeah, it looks like it should cause problems. I think it
>> would be better to have RelationGetPartitionKey() return a copy in the
>> caller's context of the data of the relcache data, rather than the
>> relcache data itself, no? Seems like this would not fail if there never
>> is a CCI between the RelationGetPartitionKey call and the last access of
>> the returned key struct ... but if this is the explanation, then it's a
>> pretty brittle setup and we should stop relying on that.
> Yeah, all of the callers of RelationGetPartitionKey seem to assume that
> the pointer they get is guaranteed valid and will stay so forever.
> This is pretty dangerous, independently of the bug Amit mentions.
> However, I'm not sure that copy-on-read is a good solution here, because
> of exactly the point at stake that the FmgrInfos may have infrastructure.
> We don't have a way to copy that, and even if we did, copying on every
> reference would be really expensive.
> We might try to make this work like the relcache infrastructure for
> indexes, which also contains FmgrInfos. However, in the index case
> we may safely assume that that stuff never changes for the life of the
> index. I'm afraid that's not true for the partitioning data is it?
> How much does it actually buy us to store FmgrInfos here rather than,
> say, function OIDs? If we backed off to that, then the data structure
> might be simple enough that copy-on-read would work.

I'm thinking of rearranging this along the lines of rd_supportinfo
handling for indexes, which index AMs use index_procinfo to access.

> Otherwise we might need some kind of refcount mechanism. We built one
> for domain constraints in the typcache, and it's not horrid, but it is a
> fair amount of code.
>> Finally: I don't quite understand why we need two memory contexts there,
>> one for the partkey and another for the partdesc. Surely one context
>> for both is sufficient.
> It'd only matter if there were a reason to delete/rebuild one but not the
> other within a particular relcache entry, which probably there isn't.
> So using one context for both would likely be a bit simpler and more
> efficient.

While the partdesc may change due to additiona/removal of partitions,
partkey never changes. So, while we always "keep" partkey, we may
delete/rebuild partdesc if it has changed.

> BTW, another thing in the same general area that I was planning to get
> on the warpath about sooner or later is that the code managing
> rd_partcheck is really cruddy. It spews a lot of random data structure
> into the CacheMemoryContext with no way to release it at relcache inval,
> resulting in a session-lifespan memory leak. (pfree'ing just the List
> header, as RelationDestroyRelation does, is laughably inadequate.)

Indeed it is. It should've been list_free_deep() I suppose.

> Perhaps that could be fixed by likewise storing it in a sub-context
> used for partition information.

We can do that, but note that the rd_partcheck is set for "partitions",
whereas other things we're talking about are set for partitioned table
parents. But I guess it doesn't matter much.

Anyway, after reading your replies, I thought of taking a stab at unifying
the partitioning information that's cached by relcache.c.

First thing toward that I did was to unify rd_partkeycxt and rd_pdcxt into
one rd_partcxt. Instead of having RelationBuildPartitionKey and
RelationBuildPartitionDesc that build rd_partkey and rd_partdesc, resp.,
now there is just RelationBuildPartitionInfo that builds both PartitionKey
and PartitionDesc, all under rd_partcxt. That means I moved a bunch of
code from catalog/partition.c to cache/relcache.c.

Second, I made all the places that want to hang on to information from
either the PartitionKey or PartitionDesc to make its copy first. Most
such places already do most of the copying, I just modified the way it's
done. Before doing that, I replaced the FmgrInfo array in PartitionKey
with one containing OIDs. FmgrInfo array is now cached outside in
rd_partsupfuncinfo, which is accessed using partition_getprocinfo() (which
functions like index_getprocinfo) by various callers.

Then, because we now have a policy of users making a copy of partitioning
information before it can be hanged on to, the keep_* logic in
RelationClearRelation() to preserve where rd_partkey and rd_partdesc point
to is no longer needed, so removed it.

Because different callers may need only certain information from the
relcache (or really a copy of it), there are different
RelationGetPartition* functions that return a copy of the requested
information. For example, if a certain caller wants just the number of
partitions, then use RelationGetPartitionCount which returns
rd_partdesc->nparts. If want OIDs, use RelationGetPartitionOids. Callers
like the planner, executor that want to copy *all* information use
RelationGetPartitionInfo which returns a PartitionInfo that contains
fields to contain copy of rd_partkey, rd_partdesc->oids, and

Then finally, the expression tree contained in rd_partcheck now uses
memory allocated in the same rd_partcxt.

Attached find the patch. Since this started as a bugfix that will need to
be back-patched to PG10, I wonder if I need to write a separate patch to
adopt this (kind of big) cleanup/reshuffling for PG 10 code.


Attachment Content-Type Size
v1-0001-Overhaul-partition-information-caching.patch text/plain 144.9 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2018-04-12 10:14:30 Re: Partitioned tables and covering indexes
Previous Message Michael Paquier 2018-04-12 08:10:28 Re: Problem while setting the fpw with SIGHUP