Re: Multi-Column List Partitioning

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multi-Column List Partitioning
Date: 2021-05-08 08:42:20
Message-ID: CAMm1aWaXqSca6Zv4Wez+278vcvinpJT8ZTJZNCkhj88hh+CGzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Jeevan for looking into this thread.

> I did not review the patch in detail, but a quick look at it leaves me
> with following comments:

I will incorporate these changes.

> ...FOR VALUES IN (1, 2, 3, 4), (11, 22, 33, 44), where the first set
> is the list for values of column A and second list is for column B. We
> can treat these lists as A X B possible values or simply (a1, b1), (a2,
> b2) internally. However I see other proprietary databases already have
> syntax something similar that you are proposing here. So, I leave it
> open for the thoughts from experts. Also, though what I propose might be
> easy from a user perspective, but might not be that easy for
> implementation, given that for a larger number of columns in partition
list
> e.g. A X B X C X D lists become unmanageable.

I feel this is also not easy from a user's perspective. For example
for a partition
with 2 partition keys (a,b) for values like (1,1), (1,2), (1,3),
(1,4),(1,5). This
would be converted to (1,1,1,1,1), (1,2,3,4,5). It is difficult to match
the values
of column 'a' to 'b'. Anyways let's wait for the other's opinion about this.

Thanks & Regards,
Nitin Jadhav

On Fri, May 7, 2021 at 7:36 PM Jeevan Ladhe <jeevan(dot)ladhe(at)enterprisedb(dot)com>
wrote:

>
> > While reviewing one of the 'Table partitioning' related patches,
> > I found that Postgres does not support multiple column based LIST
> > partitioning. Based on this understanding, I have started working on
> > this feature. I also feel that 'Multi-Column List Partitioning' can
> > be benefited to the Postgres users in future.
>
> +1 for the feature. I also think this can help users deal with some
> useful cases.
>
>
> > CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2,
> > 2),(2, 10);
>
> IMHO, listing every single tuple like this might be a bit cumbersome for
> the user. What about something like this:
>
> ...FOR VALUES IN (1, 2, 3, 4), (11, 22, 33, 44), where the first set
> is the list for values of column A and second list is for column B. We
> can treat these lists as A X B possible values or simply (a1, b1), (a2,
> b2) internally. However I see other proprietary databases already have
> syntax something similar that you are proposing here. So, I leave it
> open for the thoughts from experts. Also, though what I propose might be
> easy from a user perspective, but might not be that easy for
> implementation, given that for a larger number of columns in partition list
> e.g. A X B X C X D lists become unmanageable.
>
> I did not review the patch in detail, but a quick look at it leaves me
> with following comments:
>
> 1.
> > + * list. Then this function will continue the serach and return the
> index of
> Typo:
> s/serach/search
>
> 2.
> A compiler warning:
> partprune.c: In function ‘get_matching_list_bounds’:
> partprune.c:2731:20: error: passing argument 5 of ‘partition_list_bsearch’
> makes pointer from integer without a cast [-Werror=int-conversion]
> 2731 | nvalues, value, &is_equal);
> | ^~~~~
> | |
> | Datum {aka long unsigned int}
> In file included from partprune.c:53:
> ../../../src/include/partitioning/partbounds.h:120:32: note: expected
> ‘Datum *’ {aka ‘long unsigned int *’} but argument is of type ‘Datum’ {aka
> ‘long unsigned int’}
> 120 | int nvalues, Datum *value, bool *is_equal);
> | ~~~~~~~^~~~~
>
> 3.
> And, a server crash with following case:
> postgres=# CREATE TABLE t1 (a int) PARTITION BY LIST (a);
> CREATE TABLE
> postgres=# CREATE TABLE t1p1 PARTITION OF t1 FOR VALUES IN (1, 2, 3);
> CREATE TABLE
> postgres=# \d+ t1p1
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !?>
>
> Stacktrace:
> (gdb) bt
> #0 __GI_raise (sig=sig(at)entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1 0x00007f5d273c5859 in __GI_abort () at abort.c:79
> #2 0x000055779d2eb69d in ExceptionalCondition
> (conditionName=0x55779d4978d8 "ptr == NULL || nodeTag(ptr) == type",
> errorType=0x55779d4978c3 "FailedAssertion",
> fileName=0x55779d4978a0 "../../../src/include/nodes/nodes.h",
> lineNumber=603) at assert.c:69
> #3 0x000055779d03a684 in castNodeImpl (type=T_Const, ptr=0x55779e457b18)
> at ../../../src/include/nodes/nodes.h:603
> #4 0x000055779d04368a in get_qual_for_list (parent=0x7f5d1df829b8,
> spec=0x55779e457950) at partbounds.c:4155
> #5 0x000055779d03ac60 in get_qual_from_partbound (rel=0x7f5d1df82570,
> parent=0x7f5d1df829b8, spec=0x55779e457950) at partbounds.c:272
> #6 0x000055779d2cf630 in generate_partition_qual (rel=0x7f5d1df82570) at
> partcache.c:379
> #7 0x000055779d2cf468 in get_partition_qual_relid (relid=32771) at
> partcache.c:308
> #8 0x000055779d2592bf in pg_get_partition_constraintdef
> (fcinfo=0x55779e44ee50) at ruleutils.c:2019
> #9 0x000055779cec7221 in ExecInterpExpr (state=0x55779e44dfb0,
> econtext=0x55779e407fe8, isnull=0x7ffddf9b109f) at execExprInterp.c:744
> #10 0x000055779cec954f in ExecInterpExprStillValid (state=0x55779e44dfb0,
> econtext=0x55779e407fe8, isNull=0x7ffddf9b109f) at execExprInterp.c:1819
> #11 0x000055779cf1d58a in ExecEvalExprSwitchContext (state=0x55779e44dfb0,
> econtext=0x55779e407fe8, isNull=0x7ffddf9b109f)
> at ../../../src/include/executor/executor.h:338
> #12 0x000055779cf1d602 in ExecProject (projInfo=0x55779e44dfa8) at
> ../../../src/include/executor/executor.h:372
> #13 0x000055779cf1db2f in ExecNestLoop (pstate=0x55779e407ed0) at
> nodeNestloop.c:241
> #14 0x000055779cedf136 in ExecProcNodeFirst (node=0x55779e407ed0) at
> execProcnode.c:462
> #15 0x000055779ced3053 in ExecProcNode (node=0x55779e407ed0) at
> ../../../src/include/executor/executor.h:257
> #16 0x000055779ced5a87 in ExecutePlan (estate=0x55779e407c80,
> planstate=0x55779e407ed0, use_parallel_mode=false, operation=CMD_SELECT,
> sendTuples=true, numberTuples=0,
> direction=ForwardScanDirection, dest=0x55779e425a88,
> execute_once=true) at execMain.c:1551
> #17 0x000055779ced372d in standard_ExecutorRun (queryDesc=0x55779e453520,
> direction=ForwardScanDirection, count=0, execute_once=true) at
> execMain.c:361
> #18 0x000055779ced353c in ExecutorRun (queryDesc=0x55779e453520,
> direction=ForwardScanDirection, count=0, execute_once=true) at
> execMain.c:305
> #19 0x000055779d13d287 in PortalRunSelect (portal=0x55779e398800,
> forward=true, count=0, dest=0x55779e425a88) at pquery.c:912
> #20 0x000055779d13cec0 in PortalRun (portal=0x55779e398800,
> count=9223372036854775807, isTopLevel=true, run_once=true,
> dest=0x55779e425a88, altdest=0x55779e425a88,
> qc=0x7ffddf9b14f0) at pquery.c:756
> #21 0x000055779d1361ce in exec_simple_query (
> query_string=0x55779e3367a0 "SELECT inhparent::pg_catalog.regclass,\n
> pg_catalog.pg_get_expr(c.relpartbound, c.oid),\n inhdetachpending,\n
> pg_catalog.pg_get_partition_constraintdef(c.oid)\nFROM pg_catalog.pg_class
> c JOIN pg_catalo"...) at postgres.c:1214
> #22 0x000055779d13ad8b in PostgresMain (argc=1, argv=0x7ffddf9b1710,
> dbname=0x55779e3626f8 "postgres", username=0x55779e3626d8 "hadoop") at
> postgres.c:4476
> #23 0x000055779d0674d3 in BackendRun (port=0x55779e358380) at
> postmaster.c:4488
> #24 0x000055779d066d8c in BackendStartup (port=0x55779e358380) at
> postmaster.c:4210
> #25 0x000055779d062f9b in ServerLoop () at postmaster.c:1742
> #26 0x000055779d062734 in PostmasterMain (argc=3, argv=0x55779e3308b0) at
> postmaster.c:1414
> #27 0x000055779cf5805f in main (argc=3, argv=0x55779e3308b0) at main.c:209
>
> Regards,
> Jeevan Ladhe
>
> On Thu, May 6, 2021 at 7:33 PM Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
> wrote:
>
>> Hi,
>>
>> While reviewing one of the 'Table partitioning' related patches, I found
>> that Postgres does not support multiple column based LIST partitioning.
>> Based on this understanding, I have started working on this feature. I also
>> feel that 'Multi-Column List Partitioning' can be benefited to the Postgres
>> users in future.
>>
>> I am attaching the WIP patch for this feature here. It supports
>> 'Multi-Column List Partitioning', however some tasks are still pending. I
>> would like to know your thoughts about this, So that I can continue the
>> work with improvising the current patch.
>>
>> Following things are handled in the patch.
>> 1. Syntax
>>
>> CREATE TABLE table_name (attrs) PARTITION BY LIST(list_of_columns);
>>
>> Earlier there was no provision to mention multiple columns as part of the
>> 'list_of_columns' clause. Now we can mention the list of columns separated
>> by comma.
>>
>> CREATE TABLE table_name_p1 PARTITION OF table_name FOR VALUES IN
>> list_of_values.
>>
>> Whereas list_of_columns can be
>> a. (value [,...])
>> b. (value [,...]) [,...]
>>
>> I would like to list a few examples here for better understanding.
>> Ex-1:
>> CREATE TABLE t1(a int) PARTITION BY LIST(a);
>> CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN (1, 2, 10, 5, 7);
>>
>> Ex-2:
>> CREATE TABLE t2(a int, b int) PARTITION BY LIST(a,b);
>> CREATE TABLE t2_1 PARTITION OF t2 FOR VALUES IN (1, 2), (1, 5), (2,
>> 2),(2, 10);
>>
>> Please share if any changes are required in the above syntax.
>>
>> 2. Modified transformation logic to support above syntax.
>>
>> 3. Modified the data structures to store the information caused by above
>> syntax. Also modified the searching logic to route the tuple to the
>> appropriate partition.
>>
>> 4. Done a few basic testing and verified CREATE TABLE, INSERT INTO and
>> SELECT are working fine.
>>
>>
>> Following items are pending and I am working on it.
>>
>> 1. Handling of 'NULL' values.
>>
>> 2. Support multi column case in partition pruning.
>>
>> 3. Add test cases to the regression test suite.
>>
>> Please share your thoughts.
>>
>>
>> Thanks & Regards,
>> Nitin Jadhav
>>
>>
>>
>>
>>
>>
>>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-05-08 08:46:00 Re: Simplify backend terminate and wait logic in postgres_fdw test
Previous Message Michael Paquier 2021-05-08 08:37:58 Re: Small issues with CREATE TABLE COMPRESSION