Re: compute_query_id and pg_stat_statements

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Christoph Berg <myon(at)debian(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: compute_query_id and pg_stat_statements
Date: 2021-05-14 04:26:23
Message-ID: 1850356.1620966383@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Maybe we should revert this thing pending somebody doing the work to
> make a version of queryid labeling that actually is negligibly cheap.
> It certainly seems like that could be done; one more traversal of the
> parse tree can't be that expensive in itself. I suspect that the
> performance problem is with the particular hashing mechanism that
> was used, which looks mighty ad-hoc anyway.

To put a little bit of meat on that idea, I experimented with jacking
up the "jumble" calculation and driving some other implementations
underneath.

I thought that Julien's "worst case" scenario was pretty far from
worst case, since it involved a join which a lot of simple queries
don't. I tested this scenario instead:

$ cat naive.sql
SELECT * FROM pg_class c ORDER BY oid DESC LIMIT 1;
$ pgbench -n -f naive.sql -T 60 postgres

which is still complicated enough that there's work for the
query fingerprinter to do, but not so much for planning and
execution.

I confirm that on HEAD, there's a noticeable TPS penalty from
turning on compute_query_id: about 3.2% on my machine.

The first patch attached replaces the "jumble" calculation
with two CRC32s (two so that we still get 64 bits out at
the end). I see 2.7% penalty with this version. Now,
I'm using an Intel machine with
#define USE_SSE42_CRC32C_WITH_RUNTIME_CHECK 1
so on machines without any hardware CRC support, this'd
likely be a loss. But it still proves the point that the
existing implementation is just not very speedy.

I then tried a really dumb xor'ing implementation, and
that gives me a slowdown of 2.2%. This could undoubtedly
be improved on further, say by unrolling the loop or
processing multiple bytes at once. One problem with it
is that I suspect it will tend to concentrate the entropy
into the third/fourth and seventh/eighth bytes of the
accumulator, since so many of the fields being jumbled
are 4-byte or 8-byte fields with most of the entropy in
their low-order bits. Probably that could be improved
with a bit more thought -- say, an extra bump of the
nextbyte pointer after each field.

Anyway, I think that what we have here is quite an inferior
implementation, and we can do better with some more thought.

regards, tom lane

Attachment Content-Type Size
query-id-with-crc.patch text/x-diff 4.5 KB
query-id-with-xor.patch text/x-diff 3.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-05-14 04:59:07 Re: Race condition in recovery?
Previous Message David Rowley 2021-05-14 04:21:52 Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?