Re: [HACKERS] Proposal: Local indexes for partitioned table

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Maksim Milyutin <milyutinma(at)gmail(dot)com>
Subject: Re: [HACKERS] Proposal: Local indexes for partitioned table
Date: 2018-01-09 04:36:10
Message-ID: CAKJS1f8Dn7swSEhOWwzZzssW7747YB=2Hi+T7uGud40dur69-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9 January 2018 at 09:54, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> I removed the pg_index.indparentidx column that previous versions add,
> and replaced it with pg_inherits rows. This makes the code a little bit
> bulkier in a couple of places, but nothing terrible. As a benefit,
> there's no extra index in pg_index now.

Hi Álvaro,

I've made a pass over this patch. It's mostly in pretty good shape,
but I did find a few things to note down.

1. "either partition" -> "either the partition"

+ so the partition index is dropped together with either
partition it indexes,
+ or with the parent index it is attached to.

2. In findDependentObjects(), should the following if test be below
the ReleaseDeletionLock call?

+ if (foundDep->deptype == DEPENDENCY_INTERNAL_AUTO)
+ break;
ReleaseDeletionLock(object);

3. The following existing comment indicates that we'll be creating a
disk file for the index. Should we update this comment to say that
this is the case only for RELKIND_INDEX?

/*
* create the index relation's relcache entry and physical disk file. (If
* we fail further down, it's the smgr's responsibility to remove the disk
* file again.)
*/

Maybe:

/*
* create the index relation's relcache entry and for non-partitioned
* indexes, a physical disk file. (If we fail further down, it's the
* smgr's responsibility to remove the disk file again.)
*/

4. Extra newline

+ recordDependencyOn(&myself, &referenced, DEPENDENCY_INTERNAL_AUTO);
+ }
+
+

5. Do you think it's out of scope to support expression indexes?

/*
* Expression indexes are currently not considered equal. Not needed for
* current callers.
*/
if (info1->ii_Expressions != NIL || info2->ii_Expressions != NIL)
return false;

6. pg_inherits.inhseqno is int, not smallint. I understand you copied
this from StoreCatalogInheritance1(), so maybe a fix should be put in
before this patch is?

void
StoreSingleInheritance(Oid relationId, Oid parentOid, int16 seqNumber)

and

values[Anum_pg_inherits_inhseqno - 1] = Int16GetDatum(seqNumber);

7. Is the following a bug fix? If so, should it not be fixed and
backpatched, rather than sneaked in here?

@@ -10119,6 +10195,7 @@ ATExecChangeOwner(Oid relationOid, Oid
newOwnerId, bool recursing, LOCKMODE lock
* relation, as well as its toast table (if it has one).
*/
if (tuple_class->relkind == RELKIND_RELATION ||
+ tuple_class->relkind == RELKIND_PARTITIONED_TABLE ||

8. Missing if (attachInfos) pfree(attachInfos);

+ if (idxes)
+ pfree(idxes);
+ if (attachRelIdxs)
+ pfree(attachRelIdxs);

9. The first OR branch of the Assert will always be false, so might as
well get rid of it.

+ if (!has_superclass(idxid))
+ continue;
+
+ Assert(!has_superclass(idxid) ||
+ (IndexGetRelation(get_partition_parent(idxid), false) ==
+ RelationGetRelid(rel)));

10. lock -> locks:

/* we already hold lock on both tables, so this is safe: */

11. "already the right" -> "already in the right"

/* Silently do nothing if already the right state */

12. It seems to be RangeVarGetRelidExtended() that locks the partition
index, not the callback.

/* no deadlock risk: our callback above already acquired the lock */

13. We seem to only be holding an AccessShareLock on the partitioned
table. What happens if someone concurrently does ALTER TABLE
partitioned_table DETACH our_partition;?

parentTbl = relation_open(parentIdx->rd_index->indrelid, AccessShareLock);

and;
/* Make sure it indexes a partition of the other index's table */
partDesc = RelationGetPartitionDesc(parentTbl);
found = false;
for (i = 0; i < partDesc->nparts; i++)
{
if (partDesc->oids[i] == state.partitionOid)
{
found = true;
break;
}
}

Wouldn't you also need an AccessExclusiveLock to prevent another
session trying to ALTER INDEX part_index ATTACH PARTITION leaf_index;
at the same time? The concurrent session could be trying to ATTACH it
to a subpartition and this comment could be attaching to the parent.

14. validatePartitionedIndex does not verify that the index it is
checking is valid. Consider the following:

create table part (a int, b int) partition by list(a);
create table part1 partition of part for values in(1) partition by list(b)
create table part1 partition of part for values in(1) partition by list(b);
create table part2 partition of part1 for values in (1);
create index on only part1 (a);
create index on only part (a);
alter index part_a_idx attach partition part1_a_idx;
\d part
Table "public.part"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
a | integer | | |
b | integer | | |
Partition key: LIST (a)
Indexes:
"part_a_idx" btree (a) <--- should be INVALID
Number of partitions: 1 (Use \d+ to list them.)

\d part1
Table "public.part1"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
a | integer | | |
b | integer | | |
Partition of: part FOR VALUES IN (1)
Partition key: LIST (b)
Indexes:
"part1_a_idx" btree (a) INVALID
Number of partitions: 1 (Use \d+ to list them.)

But that probably means you also need more code to search up to parent
partitions and try and validate any invalid indexes once a
sub-partition index is made valid.

As, if the part_a_idx had correctly been marked as INVALID, and then we do:

CREATE INDEX part2_a_idx ON part2 (a);
ALTER INDEX part1_a_idx ATTACH PARTITION part2_a_idx;

This would validate part1_a_idx, but we'd also then need to attempt
validation on part_a_idx. (I'm still reading, so perhaps you've
already thought of that.)

15. Can you explain why you do the following for CREATE INDEX, but not
for CREATE INDEX IF NOT EXISTS?

@@ -7338,6 +7363,7 @@ IndexStmt: CREATE opt_unique INDEX
opt_concurrently opt_index_name
n->concurrent = $4;
n->idxname = $5;
n->relation = $7;
+ n->relationId = InvalidOid;

It's maybe not required anyway since makeNode() will memset zero the
memory, but I think it's best to do it unless I've misunderstood what
it's for.

16. It's not exactly a problem, but I had a quick look and I don't see
any other uses of list_free() checking for a non-NIL list first:

+ if (inheritors)
+ list_free(inheritors);

Probably might as well just remove the if test.

17. Just a stylistic thing; I think it would be neater to set isindex
using the existing switch(). It would just need some case shuffling.

@@ -439,6 +440,7 @@ RelationParseRelOptions(Relation relation, HeapTuple tuple)
case RELKIND_RELATION:
case RELKIND_TOASTVALUE:
case RELKIND_INDEX:
+ case RELKIND_PARTITIONED_INDEX:
case RELKIND_VIEW:
case RELKIND_MATVIEW:
case RELKIND_PARTITIONED_TABLE:
@@ -452,10 +454,12 @@ RelationParseRelOptions(Relation relation,
HeapTuple tuple)
* we might not have any other for pg_class yet (consider executing this
* code for pg_class itself)
*/
+ isindex = relation->rd_rel->relkind == RELKIND_INDEX ||
+ relation->rd_rel->relkind == RELKIND_PARTITIONED_INDEX;

Or maybe it's even better to set relation->rd_amroutine->amoptions
into a local variable and use it unconditionally in the call to
extractRelOptions().

18. At most it can't do any harm, but is the following still needed? I
originally thought this would have been for pg_index changes. What's
changed?

-#define CATALOG_VERSION_NO 201712251
+#define CATALOG_VERSION_NO 201712291

I admit to not quite being as thorough in my review with the pg_dump code.

The tests seem fine, but I'd like to see a test for #14 once that's fixed.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2018-01-09 04:36:11 Re: [HACKERS] Replication status in logical replication
Previous Message Michael Paquier 2018-01-09 04:27:20 Re: BUG #14941: Vacuum crashes