Re: Adding support for Default partition in partitioning

From: Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Beena Emerson <memissemerson(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Adding support for Default partition in partitioning
Date: 2017-06-06 20:38:33
Message-ID: CAOgcT0Ph=MJZTOM+e46QCzVtsrq3B_e6vyPtUJWJg49+kQABRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ashutosh,

Thanks for the detailed review.

Also, please find my feedback on your comments in-lined, I also addressed
the comments given by Robert in attached patch:

On Sat, Jun 3, 2017 at 5:13 PM, Ashutosh Bapat <ashutosh.bapat@
enterprisedb.com> wrote:

> Here's some detailed review of the code.
>
> @@ -1883,6 +1883,15 @@ heap_drop_with_catalog(Oid relid)
> if (OidIsValid(parentOid))
> {
> /*
> + * Default partition constraints are constructed run-time from the
> + * constraints of its siblings(basically by negating them), so any
> + * change in the siblings needs to rebuild the constraints of the
> + * default partition. So, invalidate the sibling default
> partition's
> + * relcache.
> + */
> + InvalidateDefaultPartitionRelcache(parentOid);
> +
> Do we need a lock on the default partition for doing this? A query might be
> scanning the default partition directly and we will invalidate the relcache
> underneath it. What if two partitions are being dropped simultaneously and
> change default constraints simultaneously. Probably the lock on the parent
> helps there, but need to check it. What if the default partition cache is
> invalidated because partition gets added/dropped to the default partition
> itself. If we need a lock on the default partition, we will need to
> check the order in which we should be obtaining the locks so as to avoid
> deadlocks.

Done. I have taken a lock on default partition after acquiring a lock on
parent
relation where ever applicable.

> This also means that we have to test PREPARED statements involving
> default partition. Any addition/deletion/attach/detach of other partition
> should invalidate those cached statements.
>

Will add this in next version of patch.

> + if (partition_bound_has_default(boundinfo))
> + {
> + overlap = true;
> + with = boundinfo->default_index;
> + }
> You could possibly rewrite this as
> overlap = partition_bound_has_default(boundinfo);
> with = boundinfo->default_index;
> that would save one indentation and a conditional jump.

Done

> + if (partdesc->nparts > 0 && partition_bound_has_default(boundinfo))
> + check_default_allows_bound(parent, spec);
> If the table has a default partition, nparts > 0, nparts > 0 check looks
> redundant. The comments above should also explain that this check doesn't
> trigger when a default partition is added since we don't expect an existing
> default partition in such a case.
>

The check nparts > 0, is needed to make sure that the boundinfo is non-null,
i.e. to confirm that there exists at least one partition so that
partition_bound_has_default() does not result in segmentation fault.
I have changed the condition as below to make it more intuitive:
if (boundinfo && partition_bound_has_default(boundinfo))
Also, I have updated the comment.

> + * Checks if there exists any row in the default partition that passes the
> + * check for constraints of new partition, if any reports an error.
> grammar two conflicting ifs in the same statement. You may want to rephrase
> this as "This function checks if there exists a row in the default
> partition that fits in the new
> partition and throws an error if it finds one."
>

Done

> + if (new_spec->strategy != PARTITION_STRATEGY_LIST)
> + return;
> This should probably be an Assert. When default range partition is
> supported
> this function would silently return, meaning there is no row in the default
> partition which fits the new partition. We don't want that behavior.
>

Agreed, changed.

> The code in check_default_allows_bound() to check whether the default
> partition
> has any rows that would fit new partition looks quite similar to the code
> in
> ATExecAttachPartition() checking whether all rows in the table being
> attached
> as a partition fit the partition bounds. One thing that
> check_default_allows_bound() misses is, if there's already a constraint on
> the
> default partition refutes the partition constraint on the new partition,
> we can
> skip the scan of the default partition since it can not have rows that
> would
> fit the new partition. ATExecAttachPartition() has code to deal with a
> similar
> case i.e. the table being attached has a constraint which implies the
> partition
> constraint. There may be more cases which check_default_allows_bound()
> does not
> handle but ATExecAttachPartition() handles. So, I am wondering whether it's
> better to somehow take out the common code into a function and use it. We
> will
> have to deal with a difference through. The first one would throw an error
> when
> finding a row that satisfies partition constraints whereas the second one
> would
> throw an error when it doesn't find such a row. But this difference can be
> handled through a flag or by negating the constraint. This would also take
> care
> of Amit Langote's complaint about foreign partitions. There's also another
> difference that the ATExecAttachPartition() queues the table for scan and
> the
> actual scan takes place in ATRewriteTable(), but there is not such queue
> while
> creating a table as a partition. But we should check if we can reuse the
> code to
> scan the heap for checking a constraint.
>
> In case of ATTACH PARTITION, probably we should schedule scan of default
> partition in the alter table's work queue like what
> ATExecAttachPartition() is
> doing for the table being attached. That would fit in the way alter table
> works.

I am still working on this.
But, about your comment here:
"if there's already a constraint on the default partition refutes the
partition
constraint on the new partition, we can skip the scan":
I am so far not able to imagine such a case, since default partition
constraint
can be imagined something like "minus infinity to positive infinity with
some finite set elimination", and any new non-default partition being added
would simply be a set of finite values(at-least in case of list, but I
think range
should not also differ here). Hence one cannot imply the other here.
Possibly,
I might be missing something that you had visioned when you raised the flag,
please correct me if I am missing something.

> make_partition_op_expr(PartitionKey key, int keynum,
> - uint16 strategy, Expr *arg1, Expr *arg2)
> + uint16 strategy, Expr *arg1, Expr *arg2, bool
> is_default)
> Indentation
>

Done.

>
> + if (is_default &&
> + ((operoid = get_negator(operoid)) == InvalidOid))
> + ereport(ERROR, (errcode(ERRCODE_RESTRICT_VIOLATION),
> + errmsg("DEFAULT partition cannot
> be used without negator of operator %s",
> + get_opname(operoid))));
> +
> If the existence of default partition depends upon the negator, shouldn't
> there
> be a dependency between the default partition and the negator. At the time
> of
> creating the default partition, we will try to constuct the partition
> constraint for the default partition and if the negator doesn't exist that
> time, it will throw an error. But in an unlikely event when the user drops
> the
> negator, the partitioned table will not be usable at all, as every time it
> will
> try to create the relcache, it will try to create default partition
> constraint
> and will throw error because of missing negator. That's not a very good
> scenario. Have you tried this case? Apart from that, while restoring a
> dump, if
> the default partition gets restored before the negator is created, restore
> will
> fail with this error.

I am looking into this.

> /* Generate the main expression, i.e., keyCol = ANY (arr) */
> opexpr = make_partition_op_expr(key, 0, BTEqualStrategyNumber,
> - keyCol, (Expr *) arr);
> + keyCol, (Expr *) arr,
> spec->is_default);
> /* Build leftop = ANY (rightop) */
> saopexpr = makeNode(ScalarArrayOpExpr);
> The comments in both the places need correction, as for default partition
> the
> expression will be keyCol <> ALL(arr).

Done.

+ /*
> + * In case of the default partition for list, the partition constraint
> + * is basically any value that is not equal to any of the values in
> + * boundinfo->datums array. So, construct a list of constants from
> + * boundinfo->datums to pass to function make_partition_op_expr via
> + * ArrayExpr, which would return a negated expression for the default
> + * partition.
> + */
> This is misleading, since the actual constraint would also have NOT NULL
> or IS
> NULL in there depending upon the existence of a NULL partition.
> I would simply rephrase this as "For default list partition, collect lists
> for
> all the partitions. The default partition constraint should check that the
> partition key is equal to none of those."
>

Done.

+ ndatums = (pdesc->nparts > 0) ? boundinfo->ndatums : 0;
> wouldn't ndatums be simply boundinfo->ndatums? When nparts = 0, ndatums
> will be
> 0.
>

Yes, but in case the default partition is the first partition to be added
then
boundinfo will be null and the access to ndatums within it will result in
segmentation fault.
Simplified code to make this more readable.

> + int ndatums = 0;
> This assignment looks redundant then.
>

Per change made for above comment, this is now needed.

> + if (boundinfo && partition_bound_accepts_nulls(boundinfo))
> You have not checked existence of boundinfo when extracting ndatums out of
> it
> and just few lines below you check that. If the later check is required
> then we
> will get a segfault while extracting ndatums.
>

The code to extract ndatums is changed and now has a check now for
boundinfo,
but it would not have resulted in segmentation fault in its earlier state
also,
because there was a check for avoiding this i.e. (pdesc->nparts > 0) ?:...

> + if ((!list_has_null && !spec->is_default) ||
> + (list_has_null && spec->is_default))
> Need a comment explaining what's going on here. The condition is no more a
> simple condition.
>
> - result = -1;
> - *failed_at = parent;
> - *failed_slot = slot;
> - break;
> + if (partition_bound_has_default(partdesc->boundinfo))
> + {
> + result = parent->indexes[partdesc->boun
> dinfo->default_index];
> +
> + if (result >= 0)
> + break;
> + else
> + parent = pd[-result];
> + }
> + else
> + {
> + result = -1;
> + *failed_at = parent;
> + *failed_slot = slot;
> + break;
> + }
> The code to handle result is duplicated here and few lines below. I think
> it
> would be better to not duplicate it by having separate condition blocks to
> deal
> with setting result and setting parent. Basically if (cur_index < 0) ...
> else
> would set the result breaking when setting result = -1 explicitly. A
> follow-on
> block would adjust the parent if result < 0 or break otherwise.

I have tried to simplified it in attached patch, please let me know if that
change
looks any better.

> Both the places where DEFAULT_PARTITION_INDEX is used, its result is used
> to
> fetch OID of the default partition. So, instead of having this macro, may
> be we
> should have macro to fetch OID of default partition. But even there I
> don't see
> much value in that.

Removed the macro, and did this in place at both the places.

Further, the macro and code using that macro fetches
> rd_partdesc directly from Relation.

Done this where ever applicable.

We have RelationGetPartitionDesc() for
> that. Probably we should also add Asserts to check that every pointer in
> the
> long pointer chain is Non-null.
>

I am sorry, but I did not understand which chain you are trying to point
here.

InvalidateDefaultPartitionRelcache() is called in case of drop and detach.
> Shouldn't the constraint change when we add or attach a new partition.
> Shouldn't we invalidate the cache then as well? I am not able to find that
> code in your patch.
>

In case of CREATE/ATTACH this was taken care by a call to
CacheInvalidateRelcache(part_rel) in check_default_allows_bound(), which
wasn't
the correct place anyway, and this had a flaw that the invalidation would
not
happen in case the default partition is further partitioned.
Now, the relcache for default partition is getting invalidated for
CREATE/DROP/ALTER commands.

> /*
> + * Default partition constraints are constructed run-time from the
> + * constraints of its siblings(basically by negating them), so any
> + * change in the siblings needs to rebuild the constraints of the
> + * default partition. So, invalidate the sibling default partition's
> + * relcache.
> + */
> May be rephrase this as "The default partition constraints depend upon the
> partition bounds of other partitions. Detaching a partition invalidates the
> default partition constraints. Invalidate the default partition's relcache
> so
> that the constraints are built anew and any plans dependent on those
> constraints are invalidated as well."
>

Done!

> + errmsg("default partition is supported only for
> list partitioned table")));
> for "a" list partitioned table.
>

Done.

>
> + /*
> + * A default partition, that can be partition of either LIST
> or
> + * RANGE partitioned table.
> + * Currently this is supported only for LIST partition.
> + */
> Keep everything in single paragraph without line break.
>

Not applicable now, as I removed the later part of the comment.

}
> +
> ;
> unnecessary extra line.
>

Removed.

> + /*
> + * The default partition bound does not have any datums to be
> + * transformed, return the new bound.
> + */
> Probably not needed.
>

Removed.

>
> + if (spec->is_default && (strategy ==
> PARTITION_STRATEGY_LIST ||
> + strategy ==
> PARTITION_STRATEGY_RANGE))
> + {
> + appendStringInfoString(buf, "DEFAULT");
> + break;
> + }
> +
> What happens if strategy is something other than RANGE or LIST. For that
> matter
> why not just LIST? Possibly you could write this as
> + if (spec->is_default)
> + {
> + Assert(strategy == PARTITION_STRATEGY_LIST);
> + appendStringInfoString(buf, "DEFAULT");
> + break;
> + }
>

Done.

> @@ -2044,7 +2044,7 @@ psql_completion(const char *text, int start, int end)
> COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, "");
> /* Limited completion support for partition bound specification */
> else if (TailMatches3("ATTACH", "PARTITION", MatchAny))
> - COMPLETE_WITH_CONST("FOR VALUES");
> + COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT");
> else if (TailMatches2("FOR", "VALUES"))
> COMPLETE_WITH_LIST2("FROM (", "IN (");
>
> @@ -2483,7 +2483,7 @@ psql_completion(const char *text, int start, int end)
> COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_partitioned_tables,
> "");
> /* Limited completion support for partition bound specification */
> else if (TailMatches3("PARTITION", "OF", MatchAny))
> - COMPLETE_WITH_CONST("FOR VALUES");
> + COMPLETE_WITH_LIST2("FOR VALUES", "DEFAULT");
> Do we include psql tab completion in the main feature patch? I have not
> seen
> this earlier. But appreciate taking care of these defails.
>

I am not sure about this. If needed I can submit a patch to take care of
this later, but
as of now I have not removed this from the patch.

+char *ExecBuildSlotValueDescription(Oid reloid,
> needs an "extern" declaration.
>

Per one of the comment[1] given by Amit Langote, I have removed a call to
ExecBuildSlotValueDescription(), and this was a leftover, I cleaned it up.

[1]https://www.postgresql.org/message-id/7c758a6b-107e-7c82-
0d3c-3af7965cad3f%40lab.ntt.co.jp

Regards,
Jeevan Ladhe

Attachment Content-Type Size
default_partition_v19.patch application/octet-stream 49.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2017-06-06 20:41:25 Re: PG10 transition tables, wCTEs and multiple operations on the same table
Previous Message Sergey Burladyan 2017-06-06 20:30:20 Re: Broken hint bits (freeze)