Re: Declarative partitioning - another take

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Declarative partitioning - another take
Date: 2016-12-07 04:38:03
Message-ID: CA+Tgmob9o3+xgYHtGbfzUWm9bknRryx53Q5pEVFkzVHSDO=QXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 30, 2016 at 10:56 AM, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> The latest patch I posted earlier today has this implementation.

I decided to try out these patches today with #define
CLOBBER_CACHE_ALWAYS 1 in pg_config_manual.h, which found a couple of
problems:

1. RelationClearRelation() wasn't preserving the rd_partkey, even
though there's plenty of code that relies on it not changing while we
hold a lock on the relation - in particular, transformPartitionBound.

2. partition_bounds_equal() was using the comparator and collation for
partitioning column 0 to compare the datums for all partitioning
columns. It's amazing this passed the regression tests.

The attached incremental patch fixes those things and some cosmetic
issues I found along the way.

3. RelationGetPartitionDispatchInfo() is badly broken:

1010 pd[i] = (PartitionDispatch) palloc(sizeof(PartitionDispatchData));
1011 pd[i]->relid = RelationGetRelid(partrel);
1012 pd[i]->key = RelationGetPartitionKey(partrel);
1013 pd[i]->keystate = NIL;
1014 pd[i]->partdesc = partdesc;
1015 pd[i]->indexes = (int *) palloc(partdesc->nparts * sizeof(int));
1016 heap_close(partrel, NoLock);
1017
1018 m = 0;
1019 for (j = 0; j < partdesc->nparts; j++)
1020 {
1021 Oid partrelid = partdesc->oids[j];

This code imagines that pointers it extracted from partrel are certain
to remain valid after heap_close(partrel, NoLock), perhaps on the
strength of the fact that we still retain a lock on the relation. But
this isn't the case. As soon as nobody has the relation open, a call
to RelationClearRelation() will destroy the relcache entry and
everything to which it points; with CLOBBER_CACHE_ALWAYS, I see a
failure at line 1021:

#0 RelationGetPartitionDispatchInfo (rel=0x1136dddf8, lockmode=3,
leaf_part_oids=0x7fff5633b938) at partition.c:1021
1021 Oid partrelid = partdesc->oids[j];
(gdb) bt 5
#0 RelationGetPartitionDispatchInfo (rel=0x1136dddf8, lockmode=3,
leaf_part_oids=0x7fff5633b938) at partition.c:1021
#1 0x0000000109b8d71f in ExecInitModifyTable (node=0x7fd12984d750,
estate=0x7fd12b885438, eflags=0) at nodeModifyTable.c:1730
#2 0x0000000109b5e7ac in ExecInitNode (node=0x7fd12984d750,
estate=0x7fd12b885438, eflags=0) at execProcnode.c:159
#3 0x0000000109b58548 in InitPlan (queryDesc=0x7fd12b87b638,
eflags=0) at execMain.c:961
#4 0x0000000109b57dcd in standard_ExecutorStart
(queryDesc=0x7fd12b87b638, eflags=0) at execMain.c:239
(More stack frames follow...)
Current language: auto; currently minimal
(gdb) p debug_query_string
$1 = 0x7fd12b84c238 "insert into list_parted values (null, 1);"
(gdb) p partdesc[0]
$2 = {
nparts = 2139062143,
oids = 0x7f7f7f7f7f7f7f7f,
boundinfo = 0x7f7f7f7f7f7f7f7f
}

As you can see, the partdesc is no longer valid here. I'm not
immediately sure how to fix this; this isn't a simple thinko. You
need to keep the relations open for the whole duration of the query,
not just long enough to build the dispatch info. I think you should
try to revise this so that each relation is opened once and kept open;
maybe the first loop should be making a pointer-list of Relations
rather than an int-list of relation OIDs. And it seems to me (though
I'm getting a little fuzzy here because it's late) that you need all
of the partitions open, not just the ones that are subpartitioned,
because how else are you going to know how to remap the tuple if the
column order is different? But I don't see this code doing that,
which makes me wonder if the partitions are being opened yet again in
some other location.

I recommend that once you fix this, you run 'make check' with #define
CLOBBER_CACHE_ALWAYS 1 and look for other hazards. Such mistakes are
easy to make with this kind of patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
partition-fixes-rmh.patch text/x-patch 5.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-12-07 04:43:56 Re: Back-patch use of unnamed POSIX semaphores for Linux?
Previous Message Michael Paquier 2016-12-07 04:26:38 Re: Quorum commit for multiple synchronous replication.