Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE
Date: 2023-09-20 13:42:43
Message-ID: 290b9d46-5ba3-613f-a51f-4a7886878242@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Dmitry,

Thanks for looking at the patch and sorry for the long line.

On 3/17/23 21:14, Dmitry Dolgov wrote:
> The idea sounds reasonable to me, but I have to admit snapshot_and_stats
> implementation looks awkward. Maybe it would be better to have a
> separate structure field for both stats and snapshot, which will be set
> to point to a corresponding place in the shared FAM e.g. when the worker
> is getting initialized? shm_toc_allocate mentions BUFFERALIGN to handle
> possibility of some atomic operations needing it, so I guess that would
> have to be an alignment in this case as well.
>
> Probably another option would be to allocate two separate pieces of
> shared memory, which resolves questions like proper alignment, but
> annoyingly will require an extra lookup and a new key.

I considered the other options and it seems to me none of them is
particularly superior. All of them have pros and cons with the cons
mostly outweighing the pros. Let me quickly elaborate:

1. Use multiple shm_toc entries: Shared state is split into multiple
pieces. Extra pointers in BitmapHeapScanState needed to point at the
split out data. BitmapHeapScanState has already a shared_info member,
which is not a pointer to the shared memory but a pointer to the leader
local data allocated used to store the instrumentation data from the
workers. This is confusing but at least consistent with how its done in
other places (e.g. execSort.c, nodeHash.c, nodeIncrementalSort.c).
Having another pointer there which points to the shared memory makes it
even more confusing. If we go this way we would have e.g.
shared_info_copy and shared_info members in BitmapHeapScanState.

2. Store two extra pointers to the shared FAM entries in
BitmapHeapScanState: IMHO, that is the better alternative of (1) as it
doesn't need an extra TOC entry but comes with the same confusion of
multiple pointers to SharedBitmapHeapScanInfo in BitmapHeapScanState.
But maybe that's not too bad?

3. Solution in initial patch (use two functions to obtain pointers
where/when needed): Avoids the need for another pointer in
BitmapHeapScanState at the cost / ugliness of having to call the helper
functions.

Another, not yet discussed, option I can see work is:

4. Allocate a fixed amount of memory for the instrumentation stats based
on MAX_PARALLEL_WORKER_LIMIT: MAX_PARALLEL_WORKER_LIMIT is 1024 and used
as the limit of the max_parallel_workers GUC. This way
MAX_PARALLEL_WORKER_LIMIT * sizeof(BitmapHeapScanInstrumentation) = 1024
* 8 = 8192 bytes would be allocated. To cut this down in half we could
additionally change the type of lossy_pages and exact_pages from long to
uint32. Only possibly needed memory would have to get initialized, the
remaining unused memory would remain untouched to not waste cycles.

My first preference is the new option (4). My second preference is
option (1). What's your take?

--
David Geier
(ServiceNow)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2023-09-20 14:00:25 Re: Infinite Interval
Previous Message Ashutosh Bapat 2023-09-20 13:29:52 Re: Infinite Interval