Re: [HACKERS] Parallel Append implementation

From: amul sul <sulamul(at)gmail(dot)com>
To: Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Parallel Append implementation
Date: 2017-11-27 16:51:26
Message-ID: CAAJ_b94AnyjJDbqdcpqko1erNrZ0MO_F6jUCVuLUbfZqo-=QoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks a lot Rajkumar for this test. I am able to reproduce this crash by
enabling partition wise join.

The reason for this crash is the same as
​ the​
previous[1] i.e node->as_whichplan
value. This time append->first_partial_plan value looks suspicious. With
the
following change to the v21 patch, I am able to reproduce this crash as
assert
failure when enable_partition_wise_join = ON otherwise working fine.

diff --git a/src/backend/executor/nodeAppend.c
b/src/backend/executor/nodeAppend.c
index e3b17cf0e2..4b337ac633 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -458,6 +458,7 @@ choose_next_subplan_for_worker(AppendState *node)

/* Backward scan is not supported by parallel-aware plans */
Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+ Assert(append->first_partial_plan < node->as_nplans);

LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);

Will look into this more, tomorrow.
​ ​

​1. http://postgr.es/m/CAAJ_b97kLNW8Z9nvc_JUUG5wVQUXvG=
f37WsX8ALF0A=KAHh3w(at)mail(dot)gmail(dot)com

Regards,
Amul

On Fri, Nov 24, 2017 at 5:00 PM, Rajkumar Raghuwanshi
<rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com> wrote:
> On Thu, Nov 23, 2017 at 2:22 PM, amul sul <sulamul(at)gmail(dot)com> wrote:
>> Look like it is the same crash what v20 claim to be fixed, indeed I
>> missed to add fix[1] in v20 patch, sorry about that. Attached updated
>> patch includes aforementioned fix.
>
> Hi,
>
> I have applied latest v21 patch, it got crashed when enabled
> partition-wise-join,
> same query is working fine with and without partition-wise-join
> enabled on PG-head.
> please take a look.
>
> SET enable_partition_wise_join TO true;
>
> CREATE TABLE pt1 (a int, b int, c text, d int) PARTITION BY LIST(c);
> CREATE TABLE pt1_p1 PARTITION OF pt1 FOR VALUES IN ('0000', '0001',
> '0002', '0003');
> CREATE TABLE pt1_p2 PARTITION OF pt1 FOR VALUES IN ('0004', '0005',
> '0006', '0007');
> CREATE TABLE pt1_p3 PARTITION OF pt1 FOR VALUES IN ('0008', '0009',
> '0010', '0011');
> INSERT INTO pt1 SELECT i % 20, i % 30, to_char(i % 12, 'FM0000'), i %
> 30 FROM generate_series(0, 99999) i;
> ANALYZE pt1;
>
> CREATE TABLE pt2 (a int, b int, c text, d int) PARTITION BY LIST(c);
> CREATE TABLE pt2_p1 PARTITION OF pt2 FOR VALUES IN ('0000', '0001',
> '0002', '0003');
> CREATE TABLE pt2_p2 PARTITION OF pt2 FOR VALUES IN ('0004', '0005',
> '0006', '0007');
> CREATE TABLE pt2_p3 PARTITION OF pt2 FOR VALUES IN ('0008', '0009',
> '0010', '0011');
> INSERT INTO pt2 SELECT i % 20, i % 30, to_char(i % 12, 'FM0000'), i %
> 30 FROM generate_series(0, 99999) i;
> ANALYZE pt2;
>
> EXPLAIN ANALYZE
> SELECT t1.c, sum(t2.a), COUNT(*) FROM pt1 t1 FULL JOIN pt2 t2 ON t1.c
> = t2.c GROUP BY t1.c ORDER BY 1, 2, 3;
> WARNING: terminating connection because of crash of another server
process
> DETAIL: The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT: In a moment you should be able to reconnect to the database and
> repeat your command.
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !>
>
> stack-trace is given below.
>
> Core was generated by `postgres: parallel worker for PID 73935
> '.
> Program terminated with signal 11, Segmentation fault.
> #0 0x00000000006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at
> ../../../src/include/executor/executor.h:238
> 238 if (node->chgParam != NULL) /* something changed? */
> Missing separate debuginfos, use: debuginfo-install
> keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64
> libcom_err-1.41.12-23.el6.x86_64 libselinux-2.0.94-7.el6.x86_64
> openssl-1.0.1e-57.el6.x86_64 zlib-1.2.3-29.el6.x86_64
> (gdb) bt
> #0 0x00000000006dc4b3 in ExecProcNode (node=0x7f7f7f7f7f7f7f7e) at
> ../../../src/include/executor/executor.h:238
> #1 0x00000000006dc72e in ExecAppend (pstate=0x26cd6e0) at
nodeAppend.c:207
> #2 0x00000000006d1e7c in ExecProcNodeInstr (node=0x26cd6e0) at
> execProcnode.c:446
> #3 0x00000000006dcee5 in ExecProcNode (node=0x26cd6e0) at
> ../../../src/include/executor/executor.h:241
> #4 0x00000000006dd38c in fetch_input_tuple (aggstate=0x26cd7f8) at
> nodeAgg.c:699
> #5 0x00000000006e02eb in agg_fill_hash_table (aggstate=0x26cd7f8) at
> nodeAgg.c:2536
> #6 0x00000000006dfb2b in ExecAgg (pstate=0x26cd7f8) at nodeAgg.c:2148
> #7 0x00000000006d1e7c in ExecProcNodeInstr (node=0x26cd7f8) at
> execProcnode.c:446
> #8 0x00000000006d1e4d in ExecProcNodeFirst (node=0x26cd7f8) at
> execProcnode.c:430
> #9 0x00000000006c9439 in ExecProcNode (node=0x26cd7f8) at
> ../../../src/include/executor/executor.h:241
> #10 0x00000000006cbd73 in ExecutePlan (estate=0x26ccda0,
> planstate=0x26cd7f8, use_parallel_mode=0 '\000', operation=CMD_SELECT,
> sendTuples=1 '\001', numberTuples=0,
> direction=ForwardScanDirection, dest=0x26b2ce0, execute_once=1
> '\001') at execMain.c:1718
> #11 0x00000000006c9a12 in standard_ExecutorRun (queryDesc=0x26d7fa0,
> direction=ForwardScanDirection, count=0, execute_once=1 '\001') at
> execMain.c:361
> #12 0x00000000006c982e in ExecutorRun (queryDesc=0x26d7fa0,
> direction=ForwardScanDirection, count=0, execute_once=1 '\001') at
> execMain.c:304
> #13 0x00000000006d096c in ParallelQueryMain (seg=0x26322a8,
> toc=0x7fda24d46000) at execParallel.c:1271
> #14 0x000000000053272d in ParallelWorkerMain (main_arg=1203628635) at
> parallel.c:1149
> #15 0x00000000007e8c99 in StartBackgroundWorker () at bgworker.c:841
> #16 0x00000000007fc029 in do_start_bgworker (rw=0x2656d00) at
postmaster.c:5741
> #17 0x00000000007fc36b in maybe_start_bgworkers () at postmaster.c:5945
> #18 0x00000000007fb3fa in sigusr1_handler (postgres_signal_arg=10) at
> postmaster.c:5134
> #19 <signal handler called>
> #20 0x0000003dd26e1603 in __select_nocancel () at
> ../sysdeps/unix/syscall-template.S:82
> #21 0x00000000007f6bee in ServerLoop () at postmaster.c:1721
> #22 0x00000000007f63dd in PostmasterMain (argc=3, argv=0x2630180) at
> postmaster.c:1365
> #23 0x000000000072cb40 in main (argc=3, argv=0x2630180) at main.c:228
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
> QMG, EnterpriseDB Corporation

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oliver Ford 2017-11-27 16:55:17 Re: Add RANGE with values and exclusions clauses to the Window Functions
Previous Message Tomas Vondra 2017-11-27 16:47:13 Re: [HACKERS] PATCH: multivariate histograms and MCV lists