RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

From: "Pengchengliu" <pengchengliu(at)tju(dot)edu(dot)cn>
To: "'Greg Nancarrow'" <gregn4422(at)gmail(dot)com>
Cc: "'Andres Freund'" <andres(at)anarazel(dot)de>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Parallel scan with SubTransGetTopmostTransaction assert coredump
Date: 2021-05-14 10:35:53
Message-ID: 002f01d748ac$eaa781a0$bff684e0$@tju.edu.cn
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Greg,

When you get the different xmin between active snapshot and transaction snapshot, maybe there is no coredump.

As maybe there is not tupe(xmin) between ActiveSnapshot->xmin and TransactionSnapshot->xmin which needs to be scaned in parallel process.

There is no doubt, it is very likely that ActiveSnapshot->xmin precedes TransactionSnapshot->xmin.

For this coredump, we must make sure parallel and snapshot overflow. If snapshot is not overflow, you cannot get the coredump.

As coredump is from parallel scan in MVCC when snapshot is overflow.

Did you use pgbench with the script sub_120.sql which I provide in attachment?

As the default PGPROC_MAX_CACHED_SUBXIDS is 64. In script sub_120.sql, for one transaction, it will use 120 subtransactions which is much larger than 64.

While getting the snapshot, it must be overflow. I really don't know why your snapshot is not overflow.

Did you increase the number PGPROC_MAX_CACHED_SUBXIDS? Please don't change any codes, now we just use the origin codes in PG13.2.

I have checked the codes in master branch, there is no change about this mechanism. This issue should still exist.

Thanks

Pengcheng

-----Original Message-----
From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Sent: 2021年5月14日 16:47
To: Pengchengliu <pengchengliu(at)tju(dot)edu(dot)cn>
Cc: Andres Freund <andres(at)anarazel(dot)de>; PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Fri, May 14, 2021 at 12:25 PM Pengchengliu < <mailto:pengchengliu(at)tju(dot)edu(dot)cn> pengchengliu(at)tju(dot)edu(dot)cn> wrote:

>

> Hi Greg,

>

> Thanks for you replay and test.

>

>

>

> When main process gets the transaction snapshot in InitializeParallelDSM->GetTransactionSnapshot, the transaction snapshot xmin is very likely follows active snapshot xmin.

>

>

>

> Use gdb it is easy to verify it.

>

> Create env as blow:

>

>

>

> 1, Use PG13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), init a cluster database.

>

> 2, Append the postgres.conf as below:

>

>

>

> max_connections = 2000

>

> parallel_setup_cost=0

>

> parallel_tuple_cost=0

>

> min_parallel_table_scan_size=0

>

> max_parallel_workers_per_gather=8

>

> max_parallel_workers = 128

>

>

>

> 3, Start the cluster database. Use the init_test.sql script in attachment to create some test tables.

>

> 4, Use the sub_120.sql script in attachment with pgbench to test it.

>

>

>

> pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 1800

>

>

>

>

>

>

>

> Then you can login the database, and use GDB to verify it.

>

> 1, First use explain, make sure force Parallel is OK.

>

>

>

> postgres=# explain (verbose,analyze) select count(*) from contend1;

>

>

> QUERY PLAN

>

>

>

> ----------------------------------------------------------------------

> ---------------------------------------------------------------

>

> ----------------

>

> Finalize Aggregate (cost=12006.11..12006.12 rows=1 width=8) (actual

> time=1075.214..1075.449 rows=1 loops=1)

>

> Output: count(*)

>

> -> Gather (cost=12006.08..12006.09 rows=8 width=8) (actual

> time=1075.198..1075.433 rows=1 loops=1)

>

> Output: (PARTIAL count(*))

>

> Workers Planned: 8

>

> Workers Launched: 0

>

> -> Partial Aggregate (cost=12006.08..12006.09 rows=1

> width=8) (actual time=1074.674..1074.676 rows=1 loops=1)

>

> Output: PARTIAL count(*)

>

> -> Parallel Seq Scan on public.contend1

> (cost=0.00..11690.06 rows=126406 width=0) (actual time=0.008..587.454

> rows=1

>

> 010200 loops=1)

>

> Output: id, val, c2, c3, c4, c5, c6, c7, c8, c9,

> c10, crt_time

>

> Planning Time: 0.123 ms

>

> Execution Time: 1075.588 ms

>

> postgres=# select pg_backend_pid();

>

> pg_backend_pid

>

> ----------------

>

> 2182678

>

>

>

>

>

> 2, use gdb to debug our backend process. Add the breakpoint in parallel.c:219 and continue.

>

>

>

> gdb -q -p 2182678

>

> ...

>

> (gdb) b parallel.c:219

>

> Breakpoint 1 at 0x55d085: file /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c, line 219.

>

> (gdb) c

>

> Continuing.

>

>

>

> 3, In the psql clinet, we can execute the explain command in step1 again.

>

> After we get the breakpoint in gdb, we wait a moment. Then we execute next.

>

> Use gdb check active_snapshot and transaction_snapshot, active_snapshot->xmin is 158987 and transaction_snapshot->xmin is 162160.

>

> When I use gdb test it, sometimes active_snapshot is the same as transaction_snapshot. Then you can try it multiple times, and before execute next, try wait longer time.

>

>

>

> Breakpoint 1, InitializeParallelDSM (pcxt=0x2d53670)

>

> at

> /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/ac

> cess/transam/parallel.c:219

>

> 219 Snapshot transaction_snapshot = GetTransactionSnapshot();

>

> (gdb) n

>

> 220 Snapshot active_snapshot = GetActiveSnapshot();

>

> (gdb)

>

> 223 oldcontext = MemoryContextSwitchTo(TopTransactionContext);

>

> (gdb) p *transaction_snapshot

>

> $1 = {snapshot_type = SNAPSHOT_MVCC, xmin = 162160, xmax = 183011, xip

> = 0x2d50d10, xcnt = 179, subxip = 0x148a9cddf010,

>

> subxcnt = 0, suboverflowed = true, takenDuringRecovery = false,

> copied = false, curcid = 0, speculativeToken = 0,

>

> active_count = 0, regd_count = 0, ph_node = {first_child = 0x0,

> next_sibling = 0x0, prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}

>

> (gdb) p *active_snapshot

>

> $2 = {snapshot_type = SNAPSHOT_MVCC, xmin = 158987, xmax = 173138, xip

> = 0x2d53288, xcnt = 178, subxip = 0x0, subxcnt = 0,

>

> suboverflowed = true, takenDuringRecovery = false, copied = true,

> curcid = 0, speculativeToken = 0, active_count = 1,

>

> regd_count = 2, ph_node = {first_child = 0x0, next_sibling = 0x0,

> prev_or_parent = 0x2d52e48}, whenTaken = 0, lsn = 0}

>

> (gdb)

>

>

Hi Pengcheng,

I followed all your steps.

However, I perhaps get different behavior in my environment.

99% of the time, the xmin and xmax of the active_snapshot and transaction_snapshot are the same (regardless of how long I wait at different points after the breakpoint is hit). I've had one or two instances where the xmax values differ. I managed to catch just one case where there were different xmin and xmax values in the snapshots, but this occurred just prior to the pgbench client completing and terminating, and when I continued in the debugger, there was no crash/coredump.

However, I think I've spotted something potentially important to this issue:

For me, almost always "suboverflowed = false" in the snapshots (except in that one case, right at the end of the pgbench run), yet in your gdb example "suboverflowed = true" in both of the snapshots (i.e. the snapshot subxip array has overflowed). I'm guessing that this may be related to the coredump issue, but I'm not exactly sure how it has happened, and why it seemingly isn't being handled correctly and causes that Assert to fire in your case.

Can you try and find out how the snapshot suboverflow is being set in your case? (since you are getting this readily in your examined

snapshots???) I think there's only several places where it can be set to "true" (e.g. procarray.c:1641).

Also, does increasing PGPROC_MAX_CACHED_SUBXIDS avoid, or delay, the problem for you? It's currently defined as 64.

I notice that there's been some changes related to snapshot data handling and subxid overflow since 13.2, so I'm wondering whether your coredump issue can be reproduced with the latest code?

Regards,

Greg Nancarrow

Fujitsu Australia

Attachment Content-Type Size
test_script.tar application/x-tar 30.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2021-05-14 11:11:29 Re: Support for VACUUMing Foreign Tables
Previous Message Noah Misch 2021-05-14 08:50:30 Re: dump cannot be restored if schema permissions revoked