Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Date: 2021-05-24 05:21:44
Message-ID: CAJcOf-cNLhA7iaUYAQqZ44tz3oHJoPxGRm1+tNE27iJXTXObzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 24, 2021 at 2:50 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote:
> > Keep cfbot happy, use the PG14 patch as latest.
>
> This stuff is usually very tricky.

Agreed. That's why I was looking for experts in this snapshot-handling
code, to look closer at this issue, check my proposed fix, come up
with a better solution etc.

>Do we have a way to reliably
> reproduce the report discussed here?

I couldn't reproduce it in my environment (though I could understand
what was going wrong, based on the description provided).
houzj (houzj(dot)fnst(at)fujitsu(dot)com) was able to reproduce it in his
environment and kindly provided to me the following information:
(He said that he followed most of the steps described by the original
problem reporter, Pengcheng, but perhaps steps 2 and 7 are a little
different from his steps. See the emails higher in the thread for the
two scripts "init_test.sql" and "sub_120.sql")

===

1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file
"src/include/access/subtrans.h" line number 15.
2, configure with enable assert and build it.( ./configure
--enable-cassert --prefix=/home/pgsql)
3, init a new database cluster.
4, modify postgres.conf and add some parameters as below. As the
coredump from parallel scan, so we adjust parallel setting, make it
easy to reproduce.

max_connections = 2000

parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=8
max_parallel_workers = 32

5, start the database cluster.
6, use the script init_test.sql in attachment to create tables.
7, use pgbench with script sub_120.sql in attachment to test it. Try
it sometimes, you should get the coredump file.
pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 12000
(If cannot reproduce it, maybe you can try run two parallel pgbench
xx at the same time)

In my environment(CentOS 8.2, 128G RAM, 40 processors, disk SAS
Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz),
sometimes I can reproduce in about 5 minutes , but sometimes it needs
about half an hour.

Best regards,
houzj

===

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-05-24 05:31:04 Re: [PATCH] Add `truncate` option to subscription commands
Previous Message houzj.fnst@fujitsu.com 2021-05-24 05:15:46 RE: Parallel INSERT SELECT take 2