Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From: Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date: 2018-03-07 13:59:53
Message-ID: CANEvxPoWtCgrKQHjbkb-QmkDV2gOQWv241Y7-gqaoxT+g4-fPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 7, 2018 at 7:16 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Mar 7, 2018 at 8:13 AM, Prabhat Sahu <
> prabhat(dot)sahu(at)enterprisedb(dot)com> wrote:
>
>> Hi all,
>>
>> While testing this feature I found a crash on PG head with parallel
>> create index using pgbanch tables.
>>
>> -- GUCs under postgres.conf
>> max_parallel_maintenance_workers = 16
>> max_parallel_workers = 16
>> max_parallel_workers_per_gather = 8
>> maintenance_work_mem = 8GB
>> max_wal_size = 4GB
>>
>> ./pgbench -i -s 500 -d postgres
>>
>> postgres=# create index pgb_acc_idx3 on pgbench_accounts(aid,
>> abalance,filler);
>> WARNING: terminating connection because of crash of another server
>> process
>> DETAIL: The postmaster has commanded this server process to roll back
>> the current transaction and exit, because another server process exited
>> abnormally and possibly corrupted shared memory.
>> HINT: In a moment you should be able to reconnect to the database and
>> repeat your command.
>> server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> The connection to the server was lost. Attempting reset: Failed.
>> !>
>>
>
> That makes it look like perhaps one of the worker backends crashed. Did
> you get a message in the logfile that might indicate the nature of the
> crash? Something with PANIC or TRAP, perhaps?
>

I am not able to see any PANIC/TRAP in log file,
Here are the contents.

[edb(at)localhost bin]$ cat logsnew
2018-03-07 19:21:20.922 IST [54400] LOG: listening on IPv6 address "::1",
port 5432
2018-03-07 19:21:20.922 IST [54400] LOG: listening on IPv4 address
"127.0.0.1", port 5432
2018-03-07 19:21:20.925 IST [54400] LOG: listening on Unix socket
"/tmp/.s.PGSQL.5432"
2018-03-07 19:21:20.936 IST [54401] LOG: database system was shut down at
2018-03-07 19:21:20 IST
2018-03-07 19:21:20.939 IST [54400] LOG: database system is ready to
accept connections
2018-03-07 19:24:44.263 IST [54400] LOG: background worker "parallel
worker" (PID 54482) was terminated by signal 9: Killed
2018-03-07 19:24:44.286 IST [54400] LOG: terminating any other active
server processes
2018-03-07 19:24:44.297 IST [54405] WARNING: terminating connection
because of crash of another server process
2018-03-07 19:24:44.297 IST [54405] DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2018-03-07 19:24:44.297 IST [54405] HINT: In a moment you should be able
to reconnect to the database and repeat your command.
2018-03-07 19:24:44.301 IST [54478] WARNING: terminating connection
because of crash of another server process
2018-03-07 19:24:44.301 IST [54478] DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2018-03-07 19:24:44.301 IST [54478] HINT: In a moment you should be able
to reconnect to the database and repeat your command.
2018-03-07 19:24:44.494 IST [54504] FATAL: the database system is in
recovery mode
2018-03-07 19:24:44.496 IST [54400] LOG: all server processes terminated;
reinitializing
2018-03-07 19:24:44.513 IST [54505] LOG: database system was interrupted;
last known up at 2018-03-07 19:22:54 IST
2018-03-07 19:24:44.552 IST [54505] LOG: database system was not properly
shut down; automatic recovery in progress
2018-03-07 19:24:44.554 IST [54505] LOG: redo starts at 0/AB401A38
2018-03-07 19:25:14.712 IST [54505] LOG: invalid record length at
1/818B8D80: wanted 24, got 0
2018-03-07 19:25:14.714 IST [54505] LOG: redo done at 1/818B8D48
2018-03-07 19:25:14.714 IST [54505] LOG: last completed transaction was at
log time 2018-03-07 19:24:05.322402+05:30
2018-03-07 19:25:16.887 IST [54400] LOG: database system is ready to
accept connections

--

With Regards,

Prabhat Kumar Sahu
Skype ID: prabhat.sahu1984
EnterpriseDB Corporation

The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-03-07 14:04:04 Re: Typo in objectaccess.h prototype
Previous Message Robert Haas 2018-03-07 13:46:36 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)