From: | "Pengchengliu" <pengchengliu(at)tju(dot)edu(dot)cn> |
---|---|
To: | <rhaas(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "'Andres Freund'" <andres(at)anarazel(dot)de>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>, "'Greg Nancarrow'" <gregn4422(at)gmail(dot)com> |
Subject: | RE: Parallel scan with SubTransGetTopmostTransaction assert coredump |
Date: | 2021-05-17 11:18:05 |
Message-ID: | 001a01d74b0e$4ef03770$ecd0a650$@tju.edu.cn |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Tom & Robert,
Could you review this Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin)) in SubTransGetTopmostTransaction.
I think this assert is unsuitable for parallel work process.
Before we discuss it in
https://www.postgresql-archive.org/Parallel-scan-with-SubTransGetTopmostTransaction-assert-coredump-td6197408.html
Thanks
Pengcheng
-----Original Message-----
From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Sent: 2021年5月15日 0:44
To: Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn>
Cc: Andres Freund <andres(at)anarazel(dot)de>; PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
On Fri, May 14, 2021 at 8:36 PM Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn> wrote:
> Did you use pgbench with the script sub_120.sql which I provide in attachment?
yes
>
> Did you increase the number PGPROC_MAX_CACHED_SUBXIDS? Please don't change any codes, now we just use the origin codes in PG13.2.
>
No, I have made no source code changes at all.
That was my suggestion, for you to try - because if the problem is avoided by increasing PGPROC_MAX_CACHED_SUBXIDS (to say 128) then it probably indicates the overflow condition is affecting the xmin.xmax of the two snapshots such that it invalidates the condition that is asserted.
I think one problem is that in your settings, you haven't set "max_worker_processes", yet have set "max_parallel_workers = 128".
I'm finding no more than 8 parallel workers are actually active at any one time.
On top of this, you've got pgbench running with 200 concurrent clients.
So many queries are actually executing parallel plans without using parallel workers, as the workers can't actually be launched (and this is probably why I'm finding it hard to reproduce the issue, if the problem involves snapshot suboverflow and parallel workers).
I find that the following settings improve the parallelism per query and the whole test runs very much faster:
max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=4
max_parallel_workers = 100
max_worker_processes = 128
and adjust the pgbench command-line: pgbench -d postgres -p 33550
-n -r -f sub_120.sql -c 25 -j 25 -T 1800
Problem is, I still get no coredump when using this.
Can you try these settings and let me know if the crash still happens if you use these settings?
I also tried:
max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=2
max_parallel_workers = 280
max_worker_processes = 300
and the pgbench command-line: pgbench -d postgres -p 33550 -n -r
-f sub_120.sql -c 140 -j 140 -T 1800
- but I still get no coredump.
Regards,
Greg Nancarrow
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Ranier Vilela | 2021-05-17 11:28:46 | Re: Possible memory corruption (src/timezone/zic.c b/src/timezone/zic.c) |
Previous Message | Daniel Gustafsson | 2021-05-17 11:06:51 | Re: Multiple hosts in connection string failed to failover in non-hot standby mode |