Re: A qsort template

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A qsort template
Date: 2021-06-29 00:16:16
Message-ID: CA+hUKG+-Nx4CfutUsgG-x7qfzmV-VHtcdD_kyRksE4t7CWYN3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi John,

On Tue, Jun 29, 2021 at 7:13 AM John Naylor
<john(dot)naylor(at)enterprisedb(dot)com> wrote:
> I plan to do some performance testing with VACUUM, ANALYZE etc soon, to see if I can detect any significant differences.

Thanks!

> I did a quick check of the MacOS/clang binary size (no debug symbols):
>
> master: 8108408
> 0001-0009: 8125224

Not too bad.

> Later, I'll drill down into the individual patches and see if anything stands out.
>
> There were already some comments for v2 upthread about formatting and an overflow hazard, but I did find a few more things to ask about:

Right, here's an update with fixes discussed earlier with Zhihong and Tom:

* COMPARE_TYPE -> COMPARE_RET_TYPE
* quit fighting with pgindent (I will try to fix this problem generally later)
* fix overflow hazard

> - For my curiosity, there are a lot of calls to qsort/qunique in the tree -- without having looked exhaustively, do these patches focus on cases where there are bespoke comparator functions and/or hot code paths?

Patches 0006-0009 are highly specialised for local usage by a single
module, and require some kind of evidence that they're worth their
bytes, and the onus is on me there of course -- but any ideas and
feedback are welcome. There are other opportunities like these, maybe
better ones. That reminds me: I recently had a perf report from
Andres that showed the qsort in compute_scalar_stats() as quite hot.
That's probably a good candidate, and is not yet done in the current
patch set.

The lower numbered patches are all things that are reused in many
places, and in my humble opinion improve the notation and type safety
and code deduplication generally when working with common types
ItemPtr, BlockNumber, Oid, aside from any performance arguments. At
least the ItemPtr stuff *might* also speed something useful up.

I tried to measure a speedup in vacuum, but so far I have not. I did
learn some things though: While doing that with an uncorrelated index
and a lot of deleted tuples, I found that adding more
maintenance_work_mem doesn't help beyond a few MB, because then cache
misses dominate to the point where it's not better than doing multiple
passes (and this is familiar to me from work on hash joins). If I
turned on huge pages on Linux and set min_dynamic_shared_memory so
that the parallel DSM used by vacuum lives in huge pages, then
parallel vacuum with a large maintenance_work_mem starts to do much
better than non-parallel vacuum by improving the TLB misses (as with
hash joins). I thought that was quite interesting! Perhaps
bsearch_itemptr might help with correlated indexes with a lot of
deleted indexes (so not dominated by cache misses), though?

(I wouldn't be suprised if someone comes up with a much better idea
than bsearch for that anyway... a few ideas have been suggested.)

> - Aside from the qsort{_arg} precedence, is there a practical reason for keeping the new global functions in their own files?

Better idea for layout welcome. One thing I wondered while trying to
figure out where to put functions that operate on itemptr: why is
itemptr_encode() in src/include/catalog/index.h?!

> - 0002 / 0004
>
> +/* Search and unique functions inline in header. */
>
> The functions are pretty small, but is there some advantage for inlining these?

Glibc's bsearch definition is already in a header for inlining (as is
our qunique), so I thought I should preserve that characteristic on
principle. I don't have any evidence though. Other libcs I looked at
didn't have bsearch in a header. So by doing this we make the
generated code the same across platforms (all other relevant things
being equal). I don't know if it really makes much difference,
especially since in this case the comparator and size would still be
inlined if we defined it in the .c (unlike standard bsearch)...
Probably only lazy_tid_reaped() calls it enough to potentially show
any difference in a non-microbenchmark workload, if anything does.

> - 0003
>
> #include "lib/qunique.h" is not needed anymore.

Fixed.

> This isn't quite relevant for the current patch perhaps, but I'm wondering why we don't already call bsearch for RelationHasSysCache() and RelationSupportsSysCache().

Right, I missed that. Done. Nice to delete some more code.

> - 0008
>
> +#define ST_COMPARE(a, b, cxt) \
> + DatumGetInt32(FunctionCall2Coll(&cxt->flinfo, cxt->collation, *a, *b))
>
> This seems like a pretty heavyweight comparison, so I'm not sure inlining buys us much, but it seems also there are fewer branches this way. I'll come up with a test and see what happens.

I will be very interested to see the results. Thanks!

Attachment Content-Type Size
more-sort-search-specializations-v3.tgz application/x-compressed-tar 11.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-06-29 01:01:40 Re: PG 14 release notes, first draft
Previous Message Justin Pryzby 2021-06-28 23:26:56 Re: pg14b2: FailedAssertion("_bt_posting_valid(nposting)", File: "nbtdedup.c", ...