Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Date: 2021-05-14 08:47:26
Message-ID: CAJcOf-ebgfWeAqDFQfwRN=h49VzkjPunR0TvkY8nax9c1e2szw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 14, 2021 at 12:25 PM Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn> wrote:
>
> Hi Greg,
>
> Thanks for you replay and test.
>
>
>
> When main process gets the transaction snapshot in InitializeParallelDSM->GetTransactionSnapshot, the transaction snapshot xmin is very likely follows active snapshot xmin.
>
>
>
> Use gdb it is easy to verify it.
>
> Create env as blow:
>
>
>
> 1, Use PG13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), init a cluster database.
>
> 2, Append the postgres.conf as below:
>
>
>
> max_connections = 2000
>
> parallel_setup_cost=0
>
> parallel_tuple_cost=0
>
> min_parallel_table_scan_size=0
>
> max_parallel_workers_per_gather=8
>
> max_parallel_workers = 128
>
>
>
> 3, Start the cluster database. Use the init_test.sql script in attachment to create some test tables.
>
> 4, Use the sub_120.sql script in attachment with pgbench to test it.
>
>
>
> pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 1800
>
>
>
>
>
>
>
> Then you can login the database, and use GDB to verify it.
>
> 1, First use explain, make sure force Parallel is OK.
>
>
>
> postgres=# explain (verbose,analyze) select count(*) from contend1;
>
> QUERY PLAN
>
>
>
> -------------------------------------------------------------------------------------------------------------------------------------
>
> ----------------
>
> Finalize Aggregate (cost=12006.11..12006.12 rows=1 width=8) (actual time=1075.214..1075.449 rows=1 loops=1)
>
> Output: count(*)
>
> -> Gather (cost=12006.08..12006.09 rows=8 width=8) (actual time=1075.198..1075.433 rows=1 loops=1)
>
> Output: (PARTIAL count(*))
>
> Workers Planned: 8
>
> Workers Launched: 0
>
> -> Partial Aggregate (cost=12006.08..12006.09 rows=1 width=8) (actual time=1074.674..1074.676 rows=1 loops=1)
>
> Output: PARTIAL count(*)
>
> -> Parallel Seq Scan on public.contend1 (cost=0.00..11690.06 rows=126406 width=0) (actual time=0.008..587.454 rows=1
>
> 010200 loops=1)
>
> Output: id, val, c2, c3, c4, c5, c6, c7, c8, c9, c10, crt_time
>
> Planning Time: 0.123 ms
>
> Execution Time: 1075.588 ms
>
> postgres=# select pg_backend_pid();
>
> pg_backend_pid
>
> ----------------
>
> 2182678
>
>
>
>
>
> 2, use gdb to debug our backend process. Add the breakpoint in parallel.c:219 and continue.
>
>
>
> gdb -q -p 2182678
>
> ...
>
> (gdb) b parallel.c:219
>
> Breakpoint 1 at 0x55d085: file /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c, line 219.
>
> (gdb) c
>
> Continuing.
>
>
>
> 3, In the psql clinet, we can execute the explain command in step1 again.
>
> After we get the breakpoint in gdb, we wait a moment. Then we execute next.
>
> Use gdb check active_snapshot and transaction_snapshot, active_snapshot->xmin is 158987 and transaction_snapshot->xmin is 162160.
>
> When I use gdb test it, sometimes active_snapshot is the same as transaction_snapshot. Then you can try it multiple times, and before execute next, try wait longer time.
>
>
>
> Breakpoint 1, InitializeParallelDSM (pcxt=0x2d53670)
>
> at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:219
>
> 219 Snapshot transaction_snapshot = GetTransactionSnapshot();
>
> (gdb) n
>
> 220 Snapshot active_snapshot = GetActiveSnapshot();
>
> (gdb)
>
> 223 oldcontext = MemoryContextSwitchTo(TopTransactionContext);
>
> (gdb) p *transaction_snapshot
>
> $1 = {snapshot_type = SNAPSHOT_MVCC, xmin = 162160, xmax = 183011, xip = 0x2d50d10, xcnt = 179, subxip = 0x148a9cddf010,
>
> subxcnt = 0, suboverflowed = true, takenDuringRecovery = false, copied = false, curcid = 0, speculativeToken = 0,
>
> active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling = 0x0, prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}
>
> (gdb) p *active_snapshot
>
> $2 = {snapshot_type = SNAPSHOT_MVCC, xmin = 158987, xmax = 173138, xip = 0x2d53288, xcnt = 178, subxip = 0x0, subxcnt = 0,
>
> suboverflowed = true, takenDuringRecovery = false, copied = true, curcid = 0, speculativeToken = 0, active_count = 1,
>
> regd_count = 2, ph_node = {first_child = 0x0, next_sibling = 0x0, prev_or_parent = 0x2d52e48}, whenTaken = 0, lsn = 0}
>
> (gdb)
>
>

Hi Pengcheng,

I followed all your steps.
However, I perhaps get different behavior in my environment.
99% of the time, the xmin and xmax of the active_snapshot and
transaction_snapshot are the same (regardless of how long I wait at
different points after the breakpoint is hit). I've had one or two
instances where the xmax values differ. I managed to catch just one
case where there were different xmin and xmax values in the snapshots,
but this occurred just prior to the pgbench client completing and
terminating, and when I continued in the debugger, there was no
crash/coredump.

However, I think I've spotted something potentially important to this issue:
For me, almost always "suboverflowed = false" in the snapshots (except
in that one case, right at the end of the pgbench run), yet in your
gdb example "suboverflowed = true" in both of the snapshots (i.e. the
snapshot subxip array has overflowed). I'm guessing that this may be
related to the coredump issue, but I'm not exactly sure how it has
happened, and why it seemingly isn't being handled correctly and
causes that Assert to fire in your case.
Can you try and find out how the snapshot suboverflow is being set in
your case? (since you are getting this readily in your examined
snapshots???) I think there's only several places where it can be set
to "true" (e.g. procarray.c:1641).
Also, does increasing PGPROC_MAX_CACHED_SUBXIDS avoid, or delay, the
problem for you? It's currently defined as 64.
I notice that there's been some changes related to snapshot data
handling and subxid overflow since 13.2, so I'm wondering whether your
coredump issue can be reproduced with the latest code?

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2021-05-14 08:50:13 RE: Forget close an open relation in ReorderBufferProcessTXN()
Previous Message Pavel Borisov 2021-05-14 08:27:14 Re: OOM in spgist insert