Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Geier <geidav(dot)pg(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE
Date: 2024-02-17 22:31:09
Message-ID: efcb213c-c87c-41f0-85b7-7fa8079df807@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi David,

Do you plan to work continue working on this patch? I did take a look,
and on the whole it looks reasonable - it modifies the right places etc.

I think there are two things that may need an improvement:

1) Storing variable-length data in ParallelBitmapHeapState

I agree with Robert the snapshot_and_stats name is not great. I see
Dmitry mentioned phs_snapshot_off as used by ParallelTableScanDescData -
the reasons are somewhat different (phs_snapshot_off exists because we
don't know which exact struct will be allocated), while here we simply
need to allocate two variable-length pieces of memory. But it seems like
it would work nicely for this. That is, we could try adding an offset
for each of those pieces of memory:

- snapshot_off
- stats_off

I don't like the GetSharedSnapshotData name very much, it seems very
close to GetSnapshotData - quite confusing, I think.

Dmitry also suggested we might add a separate piece of shared memory. I
don't quite see how that would work for ParallelBitmapHeapState, but I
doubt it'd be simpler than having two offsets. I don't think the extra
complexity (paid by everyone) would be worth it just to make EXPLAIN
ANALYZE work.

2) Leader vs. worker counters

It seems to me this does nothing to add the per-worker values from "Heap
Blocks" into the leader, which means we get stuff like this:

Heap Blocks: exact=102 lossy=10995
Worker 0: actual time=50.559..209.773 rows=215253 loops=1
Heap Blocks: exact=207 lossy=19354
Worker 1: actual time=50.543..211.387 rows=162934 loops=1
Heap Blocks: exact=161 lossy=14636

I think this is wrong / confusing, and inconsistent with what we do for
other nodes. It's also inconsistent with how we deal e.g. with BUFFERS,
where we *do* add the values to the leader:

Heap Blocks: exact=125 lossy=10789
Buffers: shared hit=11 read=45420
Worker 0: actual time=51.419..221.904 rows=150437 loops=1
Heap Blocks: exact=136 lossy=13541
Buffers: shared hit=4 read=13541
Worker 1: actual time=56.610..222.469 rows=229738 loops=1
Heap Blocks: exact=209 lossy=20655
Buffers: shared hit=4 read=20655

Here it's not entirely obvious, because leader participates in the
execution, but once we disable leader participation, it's clearer:

Buffers: shared hit=7 read=45421
Worker 0: actual time=28.540..247.683 rows=309112 loops=1
Heap Blocks: exact=282 lossy=27806
Buffers: shared hit=4 read=28241
Worker 1: actual time=24.290..251.993 rows=190815 loops=1
Heap Blocks: exact=188 lossy=17179
Buffers: shared hit=3 read=17180

Not only is "Buffers" clearly a sum of per-worker stats, but the "Heap
Blocks" simply disappeared because the leader does nothing and we don't
print zeros.

3) I'm not sure dealing with various EXPLAIN flags may not be entirely
correct. Consider this:

EXPLAIN (ANALYZE):

-> Parallel Bitmap Heap Scan on t (...)
Recheck Cond: (a < 5000)
Rows Removed by Index Recheck: 246882
Worker 0: Heap Blocks: exact=168 lossy=15648
Worker 1: Heap Blocks: exact=302 lossy=29337

EXPLAIN (ANALYZE, VERBOSE):

-> Parallel Bitmap Heap Scan on public.t (...)
Recheck Cond: (t.a < 5000)
Rows Removed by Index Recheck: 246882
Worker 0: actual time=35.067..300.882 rows=282108 loops=1
Heap Blocks: exact=257 lossy=25358
Worker 1: actual time=32.827..302.224 rows=217819 loops=1
Heap Blocks: exact=213 lossy=19627

EXPLAIN (ANALYZE, BUFFERS):

-> Parallel Bitmap Heap Scan on t (...)
Recheck Cond: (a < 5000)
Rows Removed by Index Recheck: 246882
Buffers: shared hit=7 read=45421
Worker 0: Heap Blocks: exact=236 lossy=21870
Worker 1: Heap Blocks: exact=234 lossy=23115

EXPLAIN (ANALYZE, VERBOSE, BUFFERS):

-> Parallel Bitmap Heap Scan on public.t (...)
Recheck Cond: (t.a < 5000)
Rows Removed by Index Recheck: 246882
Buffers: shared hit=7 read=45421
Worker 0: actual time=28.265..260.381 rows=261264 loops=1
Heap Blocks: exact=260 lossy=23477
Buffers: shared hit=3 read=23478
Worker 1: actual time=28.224..261.627 rows=238663 loops=1
Heap Blocks: exact=210 lossy=21508
Buffers: shared hit=4 read=21943

Why should the per-worker buffer info be shown when combined with the
VERBOSE flag, and not just with BUFFERS, when the patch shows the
per-worker info always?

4) Now that I think about this, isn't the *main* problem really that we
don't display the sum of the per-worker stats (which I think is wrong)?
I mean, we already can get the worker details VERBOSEm right? So the
only reason to display that by default seems to be that it the values in
"Heap Blocks" are from the leader only.

BTW doesn't this also suggest some of the code added to explain.c may
not be quite necessary? Wouldn't it be enough to just "extend" the
existing code printing per-worker stats. (I haven't tried, so maybe I'm
wrong and we need the new code.)

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2024-02-17 22:40:51 Re: PGC_SIGHUP shared_buffers?
Previous Message Matthias van de Meent 2024-02-17 21:48:44 Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements