Re: Multi-Column List Partitioning

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Amul Sul <sulamul(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multi-Column List Partitioning
Date: 2021-12-29 13:58:19
Message-ID: CAMm1aWbm3D3ArmSk3F2am+MtRLo3eL+pg1j76gbSGYnP9w=1Fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for reviewing.

> > > Adding "partnatts" to this struct seems to be unnecessary, AFAIUC,
> > > added that for partition_bound_accepts_nulls(), but we can easily get
> > > that value from the partitioning key & pass an additional argument.
> > > Also, no information about the length of the "isnulls" array.
> >
> > This is required during merge_list_bounds(). AFAIK partition key
> > information is not available here.
> >
>
> You can get that as an argument, see merge_range_bounds().

Fixed.
---

> + char **colname = (char **) palloc0(partnatts * sizeof(char *));
>
> palloc0 is unnecessary.

Fixed.
---

> + foreach(cell2, rowexpr->args)
> + {
> + int idx = foreach_current_index(cell2);
> + Node *expr = lfirst(cell2);
> + Const *val =
> + transformPartitionBoundValue(pstate, expr, colname[i],
> + get_partition_col_typid(key, idx),
> + get_partition_col_typmod(key, idx),
> + get_partition_col_collation(key, idx));
> +
> + values = lappend(values, val);
> + }
>
> Array index for colname should be "idx".

Fixed.
---

> result->scan_default = partition_bound_has_default(boundinfo);
> +
> return result;
> ...
>
> /* Always include the default partition if any. */
> result->scan_default = partition_bound_has_default(boundinfo);
> -
> return result;
>
> ...
> else
> result->scan_default = partition_bound_has_default(boundinfo);
> +
> return result;
> ...
>
> - /* Add columns specified to SET NULL or SET DEFAULT if
> provided. */
> + /*
> + * Add columns specified to SET NULL or SET DEFAULT if
> + * provided.
> + */
>
> spurious change -- look like something not related to your patch.

Fixed.
---

> - * For range partitioning, we must only perform pruning with values
> - * for either all partition keys or a prefix thereof.
> + * For range partitioning and list partitioning, we must only perform
> + * pruning with values for either all partition keys or a prefix
> + * thereof.
> */
> - if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
> + if (keyno > nvalues && (context->strategy == PARTITION_STRATEGY_RANGE ||
> + context->strategy == PARTITION_STRATEGY_LIST))
> break;
>
> I think this is not true for multi-value list partitions, we might
> still want prune partitions for e.g. (100, IS NULL, 20). Correct me
> if I am missing something here.

AFAIK, the above condition/comments says that, either we should
include all keys or prefixes of the partition keys to get the
partition pruning results. For example if we have a table with 2
columns and both are present in the partition key. Let the column
names be 'a' and 'b'.

SELECT * FROM table WHERE a=1 AND b=1; - This query works for pruning
and it refers to a comment which says all partition keys are included.
SELECT * FROM table WHERE b=1; - Here partition pruning does not work
as it does not contain prefix of the partition keys.
SELECT * FROM table WHERE a=1; - This query works fine as column 'a'
is prefix of partition keys.

Please let me know if you need more information.
---

> - * For range partitioning, if we have no clauses for the current key,
> - * we can't consider any later keys either, so we can stop here.
> + * For range partitioning and list partitioning, if we have no clauses
> + * for the current key, we can't consider any later keys either, so we
> + * can stop here.
> */
> - if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
> + if ((part_scheme->strategy == PARTITION_STRATEGY_RANGE ||
> + part_scheme->strategy == PARTITION_STRATEGY_LIST) &&
> clauselist == NIL)
> break
>
> Similarly, why would this be true for list partitioning? How can we
> prune partitions if values is for e.g. (100, <not given>, 20).

The above description holds good for this also. As per the current
design, partition pruning is not applicable for the above example.
Kindly confirm whether we should support such scenarios.
---

> - if (bms_is_member(keyno, opstep->nullkeys))
> + if (bms_is_member(keyno, opstep->nullkeys) &&
> + context->strategy != PARTITION_STRATEGY_LIST)
> continue;
> Will that prune for all NULL partitioning key values?

Yes. This allows pruning with NULL values for list partitioning.
---

> + appendStringInfoString
> + (buf,
> get_list_partbound_value_string(lfirst(cell)));
>
> Formatting is not quite right.

Fixed.
---

> +/*
> + * get_min_and_max_offset
> + *
> + * Fetches the minimum and maximum offset of the matching partitions.
> + */
>
> ...
>
> +/*
> + * get_min_or_max_off
> + *
> + * Fetches either minimum or maximum offset of the matching partitions
> + * depending on the value of is_min parameter.
> + */
>
> I am not sure we really have to have separate functions but if needed
> then I would prefer to have a separate function for each min and max
> rather than combining.

If we don't make a separate function, then we have to include this
code in get_matching_list_bounds() which is already a big function. I
just made a separate function to not increase the complexity of
get_matching_list_bounds() and most of the code present in
get_min_or_max_off() is common for min and max calculation. If we make
it separate then there might be a lot of duplications. Please let me
know if you still feel if any action is required.
---

> + if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
> + {
> + *clause_is_not_null = (nulltest->nulltesttype == IS_NOT_NULL);
> + return PARTCLAUSE_MATCH_NULLNESS;
> + }
> +
> + expr = makeConst(UNKNOWNOID, -1, InvalidOid, -2, (Datum) 0,
> true, false);
> + partclause = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
> +
> + partclause->keyno = partkeyidx;
> + partclause->expr = (Expr *) expr;
> + partclause->is_null = true;
> +
> + if (nulltest->nulltesttype == IS_NOT_NULL)
> + {
> + partclause->op_is_ne = true;
> + partclause->op_strategy = InvalidStrategy;
> + }
> + else
> + {
> + partclause->op_is_ne = false;
> + partclause->op_strategy = BTEqualStrategyNumber;
> + }
>
> - return PARTCLAUSE_MATCH_NULLNESS;
> + *pc = partclause;
> + return PARTCLAUSE_MATCH_CLAUSE;
>
> I still believe considering NULL value for match clause is not a
> fundamentally correct thing. And that is only for List partitioning
> which isn't aligned with the other partitioning.

As other partitions which support multiple partition keys (Range
partitioning) do not support NULL values. This feature supports
multiple partition keys with list partitioning and it also supports
NULL values. With the existing design, I have tried to support this
feature with minimal changes as possible. If this is not the right
approach to support NULL values, I would like to know how we can
support multiple NULL values. Kindly provide more information.

Thanks & Regards,
Nitin Jadhav

On Thu, Dec 23, 2021 at 6:33 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>
> On Tue, Dec 21, 2021 at 6:34 PM Nitin Jadhav
> <nitinjadhavpostgres(at)gmail(dot)com> wrote:
> >
> > ---
> >
> > > + if (isnulls && isnulls[i])
> > > + cmpval = 0; /* NULL "=" NULL */
> > > + else
> > > + cmpval = 1; /* NULL ">" not-NULL */
> > > + }
> > > + else if (isnulls && isnulls[i])
> > > + cmpval = -1; /* not-NULL "<" NULL */
> > >
> > > I really doubt this assumption is correct; aren't those strict operators?
> >
> > Now there are possibilities of multiple NULL values. We should have a
> > mechanism to sort it when the bound values contain Non NULL and NULL
> > values. As per the above logic we put the NULL values at the end.
> > Please let me know if I am wrong.
>
> Ok, but I am not sure about the comparison approach, let's see what
> others think.
>
> > ---
> [...]
> >
> > > typedef struct PartitionBoundInfoData
> > > {
> > > char strategy; /* hash, list or range? */
> > > + int partnatts; /* number of partition key columns */
> > > int ndatums; /* Length of the datums[] array */
> > > Datum **datums;
> > > + bool **isnulls;
> > >
> > > Adding "partnatts" to this struct seems to be unnecessary, AFAIUC,
> > > added that for partition_bound_accepts_nulls(), but we can easily get
> > > that value from the partitioning key & pass an additional argument.
> > > Also, no information about the length of the "isnulls" array.
> >
> > This is required during merge_list_bounds(). AFAIK partition key
> > information is not available here.
> >
>
> You can get that as an argument, see merge_range_bounds().
>
> > > I think it would be helpful if you could split the patch: one for
> > > multi-value list partitioning and another for the partition wise join, thanks.
> >
> > I have split the patch into 2 patches. One is for the multi column
> > list partitioning core changes and the other is for partition-wise
> > join support. Each patch has its respective test cases in the
> > regression suit and regression tests run successfully on each patch.
> > Kindly let me know if any other changes are required here.
> >
>
> Thanks, for the slit that is much helpful, I have a few comments for
> the 0001 patch as follow:
>
> + char **colname = (char **) palloc0(partnatts * sizeof(char *));
>
> palloc0 is unnecessary.
> ---
>
> + foreach(cell2, rowexpr->args)
> + {
> + int idx = foreach_current_index(cell2);
> + Node *expr = lfirst(cell2);
> + Const *val =
> + transformPartitionBoundValue(pstate, expr, colname[i],
> + get_partition_col_typid(key, idx),
> + get_partition_col_typmod(key, idx),
> + get_partition_col_collation(key, idx));
> +
> + values = lappend(values, val);
> + }
>
> Array index for colname should be "idx".
> ---
>
> result->scan_default = partition_bound_has_default(boundinfo);
> +
> return result;
> ...
>
> /* Always include the default partition if any. */
> result->scan_default = partition_bound_has_default(boundinfo);
> -
> return result;
>
> ...
> else
> result->scan_default = partition_bound_has_default(boundinfo);
> +
> return result;
> ...
>
> - /* Add columns specified to SET NULL or SET DEFAULT if
> provided. */
> + /*
> + * Add columns specified to SET NULL or SET DEFAULT if
> + * provided.
> + */
>
> spurious change -- look like something not related to your patch.
> --
>
> - * For range partitioning, we must only perform pruning with values
> - * for either all partition keys or a prefix thereof.
> + * For range partitioning and list partitioning, we must only perform
> + * pruning with values for either all partition keys or a prefix
> + * thereof.
> */
> - if (keyno > nvalues && context->strategy == PARTITION_STRATEGY_RANGE)
> + if (keyno > nvalues && (context->strategy == PARTITION_STRATEGY_RANGE ||
> + context->strategy == PARTITION_STRATEGY_LIST))
> break;
>
> I think this is not true for multi-value list partitions, we might
> still want prune partitions for e.g. (100, IS NULL, 20). Correct me
> if I am missing something here.
> ---
>
> /*
> - * For range partitioning, if we have no clauses for the current key,
> - * we can't consider any later keys either, so we can stop here.
> + * For range partitioning and list partitioning, if we have no clauses
> + * for the current key, we can't consider any later keys either, so we
> + * can stop here.
> */
> - if (part_scheme->strategy == PARTITION_STRATEGY_RANGE &&
> + if ((part_scheme->strategy == PARTITION_STRATEGY_RANGE ||
> + part_scheme->strategy == PARTITION_STRATEGY_LIST) &&
> clauselist == NIL)
> break
>
> Similarly, why would this be true for list partitioning? How can we
> prune partitions if values is for e.g. (100, <not given>, 20).
> --
>
> - if (bms_is_member(keyno, opstep->nullkeys))
> + if (bms_is_member(keyno, opstep->nullkeys) &&
> + context->strategy != PARTITION_STRATEGY_LIST)
> continue;
> Will that prune for all NULL partitioning key values?
> ---
>
> + appendStringInfoString
> + (buf,
> get_list_partbound_value_string(lfirst(cell)));
>
> Formatting is not quite right.
> --
>
> +/*
> + * get_min_and_max_offset
> + *
> + * Fetches the minimum and maximum offset of the matching partitions.
> + */
>
> ...
>
> +/*
> + * get_min_or_max_off
> + *
> + * Fetches either minimum or maximum offset of the matching partitions
> + * depending on the value of is_min parameter.
> + */
>
> I am not sure we really have to have separate functions but if needed
> then I would prefer to have a separate function for each min and max
> rather than combining.
> ---
>
> + if (part_scheme->strategy != PARTITION_STRATEGY_LIST)
> + {
> + *clause_is_not_null = (nulltest->nulltesttype == IS_NOT_NULL);
> + return PARTCLAUSE_MATCH_NULLNESS;
> + }
> +
> + expr = makeConst(UNKNOWNOID, -1, InvalidOid, -2, (Datum) 0,
> true, false);
> + partclause = (PartClauseInfo *) palloc(sizeof(PartClauseInfo));
> +
> + partclause->keyno = partkeyidx;
> + partclause->expr = (Expr *) expr;
> + partclause->is_null = true;
> +
> + if (nulltest->nulltesttype == IS_NOT_NULL)
> + {
> + partclause->op_is_ne = true;
> + partclause->op_strategy = InvalidStrategy;
> + }
> + else
> + {
> + partclause->op_is_ne = false;
> + partclause->op_strategy = BTEqualStrategyNumber;
> + }
>
> - return PARTCLAUSE_MATCH_NULLNESS;
> + *pc = partclause;
> + return PARTCLAUSE_MATCH_CLAUSE;
>
> I still believe considering NULL value for match clause is not a
> fundamentally correct thing. And that is only for List partitioning
> which isn't aligned with the other partitioning.
> ---
>
> Regards,
> Amul

Attachment Content-Type Size
v10-0001-multi-column-list-partitioning-core-changes.patch application/x-patch 105.5 KB
v10-0002-partition-wise-join-support.patch application/x-patch 94.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-12-29 13:59:16 Re: Proposal: More structured logging
Previous Message Stephen Frost 2021-12-29 13:57:10 Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary