Re: multi-column range partition constraint

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Olaf Gawenda <olaf(dot)gw(at)googlemail(dot)com>
Subject: Re: multi-column range partition constraint
Date: 2017-05-12 07:26:19
Message-ID: 1e511062-3dc9-a66c-e402-00fcb1f4bc5e@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017/05/12 12:22, Robert Haas wrote:
> On Wed, May 10, 2017 at 10:21 PM, Amit Langote
> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> Next update on this issue by Thursday 5/11.
>>
>> Attached updated patches.
>
> Thanks. 0001, at least, really needs a pgindent run. Also, my
> compiler has this apparently-justifiable complaint:
>
> partition.c:1767:5: error: variable 'cur_op_intp' is used uninitialized whenever
> 'for' loop exits because its condition is false
> [-Werror,-Wsometimes-uninitialized]
> foreach(opic, op_infos)
> ^~~~~~~~~~~~~~~~~~~~~~~
> ../../../src/include/nodes/pg_list.h:162:30: note: expanded from macro 'foreach'
> for ((cell) = list_head(l); (cell) != NULL; (cell) = lnext(cell))
> ^~~~~~~~~~~~~~
>
> If by some mischance the condition intp->opfamily_id ==
> key->partopfamily[l - 1] is never satisfied, this is going to seg
> fault, which isn't good. It's pretty easy to imagine how this could
> be caused by corrupted system catalog contents or some other bug, so I
> think you should initialize the variable to NULL and elog() if it's
> still NULL when the loop exits. There is a similar problem in one
> other place.
>
> But actually, I think maybe this logic should just go away altogether,
> because backtracking when we realize that an unbounded bound follows
> is pretty ugly. Can't we just fix the loop that precedes it so that it
> treats next-bound-unbounded the same as this-is-the-last-bound (i.e.
> when it is not the case that j - i < num_or_arms)?

I think your next-bound-unbounded same as this-is-the-last-bound idea is
better. So, done that way.

>
> + if (partexprs_item)
> + partexprs_item_saved = *partexprs_item;
>
> Is there a reason why you're saving the contents of the ListCell
> instead of just a pointer to it?

That's because get_range_key_properties() modifies what partexprs_item
points to. If we had saved the pointer partexprs_item in
partexpr_item_saved, the latter will start pointing to the next cell too
upon return from that function, whereas we would want it to keep pointing
to the cell that partexprs_item originally pointed to. Am I missing
something?

> If key->partexprs == NIL,
> partexprs_item_saved will not get initialized, but the next loop will
> still point to it. That's dangerously close to a live bug, and at any
> rate a compiler warning seems likely.

Fixed.

> I don't really understand the motivation behind the
> nulltest_generated[] array. In the upper loop, we'll use any given
> value of i in only one loop iteration, so having logic to prevent a
> nulltest more than once doesn't seem like it will ever actually do
> anything. In the lower loop, it's actually doing something, but if
> (num_or_arms == 0) /* Only do this the first time through */ would
> have the same effect, I think.

Actually, I modified things so that neither of the two loops generate any
NullTests. In fact, now they are generated for all the expressions even
before the first loop begins. So any required IS NOT NULL expressions
appear at the head of the result list.

> I suggest num_or_arms -> current_or_arm and maxnum_or_arms -> num_or_arms.

Done.

> The for_both_cell loop seems to have two different exit conditions:
> either we can run off the lists, or we can find j - i sufficiently
> large. But shouldn't those things happen at the same time?

They will happen at the same time only when current_or_arm (after renaming
from num_or_arms per above comment) becomes same as the number of columns
we are building the OR expression for. For example, for a partition key
(a, b, c) and bounds from (1, 1, 1) to (1, 10, 10), we will be building
the OR expressions over b and c. The first iteration of the outer while
loop (current_or_arm = 0) only builds b > 1 and b < 10 and exits due to
j-i becoming sufficiently large, even though for_both_cell itself didn't
run off the lists. In the next iteration of the while loop
(current_or_arm = 1), we will consider both b and c and (j-i)'s becoming
sufficiently large will happen at the same time as for_both_cell's running
off the lists.

> + * If both lower_or_arms and upper_or_arms are empty, we append a
> + * constant-true expression. That happens if all of the literal values in
> + * both the lower and upper bound lists are UNBOUNDED.
> */
> - if (!OidIsValid(operoid))
> + if (lower_or_arms == NIL && upper_or_arms == NIL)
> + result = lappend(result, makeBoolConst(true, false));
>
> I think it would be better to instead say, at the very end after all
> else is done:
>
> if (result == NIL)
> result = make_list1(makeBoolConst(true, false));

This makes sense. So, I guess if there are any IS NOT NULL expressions in
the list, we don't need to add the constant-true predicate.

>
> Next update by tomorrow.

Attached updated patches.

Thanks,
Amit

Attachment Content-Type Size
0001-Emit-correct-range-partition-constraint-expression.patch text/x-diff 38.9 KB
0002-Add-pg_get_partition_constraintdef.patch text/x-diff 17.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2017-05-12 07:50:41 Re: Cached plans and statement generalization
Previous Message Kyotaro HORIGUCHI 2017-05-12 07:24:12 Re: Bug with pg_basebackup and 'shared' tablespace