Re: Multi-Column List Partitioning

From: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multi-Column List Partitioning
Date: 2021-09-13 10:17:36
Message-ID: CAKcux6=ZmSGj4+4xd31gr+JvFG19o3k1APxxmVQV-LdC80i46A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On PG head + Nitin's v3 patch + Amit's Delta patch. Make check is failing
with below errors.

--inherit.sql is failing with error :"ERROR: negative bitmapset member not
allowed"
update mlparted_tab mlp set c = 'xxx'
from
(select a from some_tab union all select a+1 from some_tab) ss (a)
where (mlp.a = ss.a and mlp.b = 'b') or mlp.a = 3;
ERROR: negative bitmapset member not allowed

--partition_join.sql is crashing with enable_partitionwise_join set to true.
CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0001',
'0003');
CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0004',
'0006');
CREATE TABLE plt1_adv_p3 PARTITION OF plt1_adv FOR VALUES IN ('0008',
'0009');
INSERT INTO plt1_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM
generate_series(1, 299) i WHERE i % 10 IN (1, 3, 4, 6, 8, 9);
ANALYZE plt1_adv;
CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002',
'0003');
CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0004',
'0006');
CREATE TABLE plt2_adv_p3 PARTITION OF plt2_adv FOR VALUES IN ('0007',
'0009');
INSERT INTO plt2_adv SELECT i, i, to_char(i % 10, 'FM0000') FROM
generate_series(1, 299) i WHERE i % 10 IN (2, 3, 4, 6, 7, 9);
ANALYZE plt2_adv;
-- inner join
EXPLAIN (COSTS OFF)
SELECT t1.a, t1.c, t2.a, t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON
(t1.a = t2.a AND t1.c = t2.c) WHERE t1.b < 10 ORDER BY t1.a;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

--stack-trace
Core was generated by `postgres: edb regression [local] EXPLAIN
'.
Program terminated with signal 6, Aborted.
#0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64
libgcc-4.8.5-39.el7.x86_64 libselinux-2.5-12.el7.x86_64
openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64
zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 0x00007f7d339ba277 in raise () from /lib64/libc.so.6
#1 0x00007f7d339bb968 in abort () from /lib64/libc.so.6
#2 0x0000000000b0fbc3 in ExceptionalCondition (conditionName=0xcbda10
"part_index >= 0", errorType=0xcbd1c3 "FailedAssertion", fileName=0xcbd2fe
"partbounds.c", lineNumber=1957)
at assert.c:69
#3 0x0000000000892aa1 in is_dummy_partition (rel=0x19b37c0, part_index=-1)
at partbounds.c:1957
#4 0x00000000008919bd in merge_list_bounds (partnatts=1,
partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0,
inner_rel=0x1922938, jointype=JOIN_INNER,
outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at
partbounds.c:1529
#5 0x00000000008910de in partition_bounds_merge (partnatts=1,
partsupfunc=0x1922798, partcollation=0x1922738, outer_rel=0x19b37c0,
inner_rel=0x1922938, jointype=JOIN_INNER,
outer_parts=0x7fffd67751b0, inner_parts=0x7fffd67751a8) at
partbounds.c:1223
#6 0x000000000082c41a in compute_partition_bounds (root=0x1a19ed0,
rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30,
parent_sjinfo=0x7fffd67752a0, parts1=0x7fffd67751b0,
parts2=0x7fffd67751a8) at joinrels.c:1644
#7 0x000000000082bc34 in try_partitionwise_join (root=0x1a19ed0,
rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30,
parent_sjinfo=0x7fffd67752a0, parent_restrictlist=0x1ab3318)
at joinrels.c:1402
#8 0x000000000082aea2 in populate_joinrel_with_paths (root=0x1a19ed0,
rel1=0x19b37c0, rel2=0x1922938, joinrel=0x1ab7f30, sjinfo=0x7fffd67752a0,
restrictlist=0x1ab3318)
at joinrels.c:926
#9 0x000000000082a8f5 in make_join_rel (root=0x1a19ed0, rel1=0x19b37c0,
rel2=0x1922938) at joinrels.c:760
#10 0x0000000000829e03 in make_rels_by_clause_joins (root=0x1a19ed0,
old_rel=0x19b37c0, other_rels_list=0x1ab2970, other_rels=0x1ab2990) at
joinrels.c:312
#11 0x00000000008298d9 in join_search_one_level (root=0x1a19ed0, level=2)
at joinrels.c:123
#12 0x000000000080c566 in standard_join_search (root=0x1a19ed0,
levels_needed=2, initial_rels=0x1ab2970) at allpaths.c:3020
#13 0x000000000080c4df in make_rel_from_joinlist (root=0x1a19ed0,
joinlist=0x199d538) at allpaths.c:2951
#14 0x000000000080816b in make_one_rel (root=0x1a19ed0, joinlist=0x199d538)
at allpaths.c:228
#15 0x000000000084491d in query_planner (root=0x1a19ed0,
qp_callback=0x84a538 <standard_qp_callback>, qp_extra=0x7fffd6775630) at
planmain.c:276
#16 0x0000000000847040 in grouping_planner (root=0x1a19ed0,
tuple_fraction=0) at planner.c:1447
#17 0x0000000000846709 in subquery_planner (glob=0x19b39d8,
parse=0x1aaa290, parent_root=0x0, hasRecursion=false, tuple_fraction=0) at
planner.c:1025
#18 0x0000000000844f3e in standard_planner (parse=0x1aaa290,
query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a,
t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c =
t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0)
at planner.c:406
#19 0x0000000000844ce9 in planner (parse=0x1aaa290,
query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a,
t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c =
t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0)
at planner.c:277
#20 0x0000000000978483 in pg_plan_query (querytree=0x1aaa290,
query_string=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a,
t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c =
t2.c) WHERE t1.b < 10 ORDER BY t1.a;", cursorOptions=2048, boundParams=0x0)
at postgres.c:847
#21 0x00000000006937fc in ExplainOneQuery (query=0x1aaa290,
cursorOptions=2048, into=0x0, es=0x19b36f0,
queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a,
t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c =
t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
params=0x0, queryEnv=0x0) at explain.c:397
#22 0x0000000000693351 in ExplainQuery (pstate=0x197c410, stmt=0x1aaa0b0,
params=0x0, dest=0x197c378) at explain.c:281
#23 0x00000000009811fa in standard_ProcessUtility (pstmt=0x1a0bfc8,
queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a,
t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c =
t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:845
#24 0x00000000009809ec in ProcessUtility (pstmt=0x1a0bfc8,
queryString=0x1830fa0 "EXPLAIN (COSTS OFF)\nSELECT t1.a, t1.c, t2.a,
t2.c FROM plt1_adv t1 INNER JOIN plt2_adv t2 ON (t1.a = t2.a AND t1.c =
t2.c) WHERE t1.b < 10 ORDER BY t1.a;",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
queryEnv=0x0, dest=0x197c378, qc=0x7fffd6775f90) at utility.c:527
#25 0x000000000097f636 in PortalRunUtility (portal=0x1893b40,
pstmt=0x1a0bfc8, isTopLevel=true, setHoldSnapshot=true, dest=0x197c378,
qc=0x7fffd6775f90) at pquery.c:1147
#26 0x000000000097f3a5 in FillPortalStore (portal=0x1893b40,
isTopLevel=true) at pquery.c:1026
#27 0x000000000097ed11 in PortalRun (portal=0x1893b40,
count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1a0c0b8,
altdest=0x1a0c0b8, qc=0x7fffd6776150) at pquery.c:758
#28 0x0000000000978aa5 in exec_simple_query (

Thanks & Regards,
Rajkumar Raghuwanshi

On Fri, Sep 3, 2021 at 7:17 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:

> On Wed, Sep 1, 2021 at 2:31 PM Amit Langote <amitlangote09(at)gmail(dot)com>
> wrote:
> > On Tue, Aug 31, 2021 at 8:02 PM Nitin Jadhav
> > <nitinjadhavpostgres(at)gmail(dot)com> wrote:
> > > The attached patch also fixes the above comments.
> >
> > I noticed that multi-column list partitions containing NULLs don't
> > work correctly with partition pruning yet.
> >
> > create table p0 (a int, b text, c bool) partition by list (a, b, c);
> > create table p01 partition of p0 for values in ((1, 1, true), (NULL, 1,
> false));
> > create table p02 partition of p0 for values in ((1, NULL, false));
> > explain select * from p0 where a is null;
> > QUERY PLAN
> > --------------------------------------------------------
> > Seq Scan on p01 p0 (cost=0.00..22.50 rows=6 width=37)
> > Filter: (a IS NULL)
> > (2 rows)
> >
> > I guess that may be due to the following newly added code being
> incomplete:
> >
> > +/*
> > + * get_partition_bound_null_index
> > + *
> > + * Returns the partition index of the partition bound which accepts
> NULL.
> > + */
> > +int
> > +get_partition_bound_null_index(PartitionBoundInfo boundinfo)
> > +{
> > + int i = 0;
> > + int j = 0;
> > +
> > + if (!boundinfo->isnulls)
> > + return -1;
> >
> > - if (!val->constisnull)
> > - count++;
> > + for (i = 0; i < boundinfo->ndatums; i++)
> > + {
> > + //TODO: Handle for multi-column cases
> > + for (j = 0; j < 1; j++)
> > + {
> > + if (boundinfo->isnulls[i][j])
> > + return boundinfo->indexes[i];
> > }
> > }
> >
> > + return -1;
> > +}
> >
> > Maybe this function needs to return a "bitmapset" of indexes, because
> > multiple partitions can now contain NULL values.
> >
> > Some other issues I noticed and suggestions for improvement:
> >
> > +/*
> > + * checkForDuplicates
> > + *
> > + * Returns TRUE if the list bound element is already present in the
> list of
> > + * list bounds, FALSE otherwise.
> > + */
> > +static bool
> > +checkForDuplicates(List *source, List *searchElem)
> >
> > This function name may be too generic. Given that it is specific to
> > implementing list bound de-duplication, maybe the following signature
> > is more appropriate:
> >
> > static bool
> > checkListBoundDuplicated(List *list_bounds, List *new_bound)
> >
> > Also, better if the function comment mentions those parameter names,
> like:
> >
> > "Returns TRUE if the list bound element 'new_bound' is already present
> > in the target list 'list_bounds', FALSE otherwise."
> >
> > +/*
> > + * transformPartitionListBounds
> > + *
> > + * Converts the expressions of list partition bounds from the raw
> grammar
> > + * representation.
> >
> > A sentence about the result format would be helpful, like:
> >
> > The result is a List of Lists of Const nodes to account for the
> > partition key possibly containing more than one column.
> >
> > + int i = 0;
> > + int j = 0;
> >
> > Better to initialize such loop counters closer to the loop.
> >
> > + colname[i] = (char *) palloc0(NAMEDATALEN * sizeof(char));
> > + colname[i] = get_attname(RelationGetRelid(parent),
> > + key->partattrs[i], false);
> >
> > The palloc in the 1st statement is wasteful, because the 2nd statement
> > overwrites its pointer by the pointer to the string palloc'd by
> > get_attname().
> >
> > + ListCell *cell2 = NULL;
> >
> > No need to explicitly initialize the loop variable.
> >
> > + RowExpr *rowexpr = NULL;
> > +
> > + if (!IsA(expr, RowExpr))
> > + ereport(ERROR,
> > + (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
> > + errmsg("Invalid list bound specification"),
> > + parser_errposition(pstate, exprLocation((Node
> > *) spec))));
> > +
> > + rowexpr = (RowExpr *) expr;
> >
> > It's okay to assign rowexpr at the top here instead of the dummy
> > NULL-initialization and write the condition as:
> >
> > if (!IsA(rowexpr, RowExpr))
> >
> > + if (isDuplicate)
> > + continue;
> > +
> > + result = lappend(result, values);
> >
> > I can see you copied this style from the existing code, but how about
> > writing this simply as:
> >
> > if (!isDuplicate)
> > result = lappend(result, values);
> >
> > -/* One value coming from some (index'th) list partition */
> > +/* One bound of a list partition */
> > typedef struct PartitionListValue
> > {
> > int index;
> > - Datum value;
> > + Datum *values;
> > + bool *isnulls;
> > } PartitionListValue;
> >
> > Given that this is a locally-defined struct, I wonder if it makes
> > sense to rename the struct while we're at it. Call it, say,
> > PartitionListBound?
> >
> > Also, please keep part of the existing comment that says that the
> > bound belongs to index'th partition.
> >
> > Will send more comments in a bit...
>
> + * partition_bound_accepts_nulls
> + *
> + * Returns TRUE if partition bound has NULL value, FALSE otherwise.
> */
>
> I suggest slight rewording, as follows:
>
> "Returns TRUE if any of the partition bounds contains a NULL value,
> FALSE otherwise."
>
> - PartitionListValue *all_values;
> + PartitionListValue **all_values;
> ...
> - all_values = (PartitionListValue *)
> - palloc(ndatums * sizeof(PartitionListValue));
> + ndatums = get_list_datum_count(boundspecs, nparts);
> + all_values = (PartitionListValue **)
> + palloc(ndatums * sizeof(PartitionListValue *));
>
> I don't see the need to redefine all_values's pointer type. No need
> to palloc PartitionListValue repeatedly for every datum as done
> further down as follows:
>
> + all_values[j] = (PartitionListValue *)
> palloc(sizeof(PartitionListValue));
>
> You do need the following two though:
>
> + all_values[j]->values = (Datum *) palloc0(key->partnatts *
> sizeof(Datum));
> + all_values[j]->isnulls = (bool *) palloc0(key->partnatts *
> sizeof(bool));
>
> If you change the above the way I suggest, you'd also need to revert
> the following change:
>
> - qsort_arg(all_values, ndatums, sizeof(PartitionListValue),
> + qsort_arg(all_values, ndatums, sizeof(PartitionListValue *),
> qsort_partition_list_value_cmp, (void *) key);
>
> + int orig_index = all_values[i]->index;
> + boundinfo->datums[i] = (Datum *) palloc(key->partnatts *
> sizeof(Datum));
>
> Missing a newline between these two statements.
>
> BTW, I noticed that the boundDatums variable is no longer used in
> create_list_bounds. I traced back its origin and found that a recent
> commit 53d86957e98 introduced it to implement an idea to reduce the
> finer-grained pallocs that were being done in create_list_bounds(). I
> don't think that this patch needs to throw away that work. You can
> make it work as the attached delta patch that applies on top of v3.
> Please check.
>
> @@ -915,7 +949,7 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
> if (b1->nindexes != b2->nindexes)
> return false;
>
> - if (b1->null_index != b2->null_index)
> + if (get_partition_bound_null_index(b1) !=
> get_partition_bound_null_index(b2))
>
> As mentioned in the last message, this bit in partition_bounds_equal()
> needs to be comparing "bitmapsets" of null bound indexes, that is
> after fixing get_partition_bound_null_index() as previously mentioned.
>
> But...
>
> @@ -988,7 +1022,22 @@ partition_bounds_equal(int partnatts, int16
> *parttyplen, bool *parttypbyval,
> * context. datumIsEqual() should be simple enough to be
> * safe.
> */
> - if (!datumIsEqual(b1->datums[i][j], b2->datums[i][j],
> + if (b1->isnulls)
> + b1_isnull = b1->isnulls[i][j];
> + if (b2->isnulls)
> + b2_isnull = b2->isnulls[i][j];
> +
> + /*
> + * If any of the partition bound has NULL value, then check
> + * equality for the NULL value instead of comparing the
> datums
> + * as it does not contain valid value in case of NULL.
> + */
> + if (b1_isnull || b2_isnull)
> + {
> + if (b1_isnull != b2_isnull)
> + return false;
> + }
>
> ...if you have this in the main loop, I don't think we need the above
> code stanza which appears to implement a short-cut for this long-form
> logic.
>
> + (key->strategy != PARTITION_STRATEGY_LIST ||
> + !src->isnulls[i][j]))
>
> I think it's better to write this condition as follows just like the
> accompanying condition involving src->kind:
>
> (src->nulls == NULL || !src->isnulls[i][j])
>
> (Skipped looking at merge_list_bounds() and related changes for now as
> I see a lot of TODOs remain to be done.)
>
> In check_new_partition_bound():
>
> + Datum *values = (Datum *)
> palloc0(key->partnatts * sizeof(Datum));
> + bool *isnulls = (bool *)
> palloc0(key->partnatts * sizeof(bool));
>
> Doesn't seem like a bad idea to declare these as:
>
> Datum values[PARTITION_MAX_KEYS];
> bool isnulls[PARTITION_MAX_KEYS];
>
>
> I looked at get_qual_for_list_multi_column() and immediately thought
> that it may be a bad idea. I think it's better to integrate the logic
> for multi-column case into the existing function even if that makes
> the function appear more complex. Having two functions with the same
> goal and mostly the same code is not a good idea mainly because it
> becomes a maintenance burden.
>
> I have attempted a rewrite such that get_qual_for_list() now handles
> both the single-column and multi-column cases. Changes included in
> the delta patch. The patch updates some outputs of the newly added
> tests for multi-column list partitions, because the new code emits the
> IS NOT NULL tests a bit differently than
> get_qual_for_list_mutli_column() would. Notably, the old approach
> would emit IS NOT NULL for every non-NULL datum matched to a given
> column, not just once for the column. However, the patch makes a few
> other tests fail, mainly because I had to fix
> partition_bound_accepts_nulls() to handle the multi-column case,
> though didn't bother to update all callers of it to also handle the
> multi-column case correctly. I guess that's a TODO you're going to
> deal with at some point anyway. :)
>
> I still have more than half of v3 left to look at, so will continue
> looking. In the meantime, please check the changes I suggested,
> including the delta patch, and let me know your thoughts.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-09-13 11:19:25 Re: refactoring basebackup.c
Previous Message Jeevan Ladhe 2021-09-13 10:02:45 Re: refactoring basebackup.c