Re: Make query ID more portable

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Jim Finnerty <jfinnert(at)amazon(dot)com>
Subject: Re: Make query ID more portable
Date: 2021-10-12 08:35:39
Message-ID: CAOBaU_YkCuyPdqGr6OJNM87cMsoCm5vOwDMhYikkp9Ovd_ErSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Oct 12, 2021 at 4:12 PM Andrey V. Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>
> QueryID is good tool for query analysis. I want to improve core jumbling
> machinery in two ways:
> 1. QueryID value should survive dump/restore of a database (use fully
> qualified name of table instead of relid).
> 2. QueryID could represent more general class of queries: for example,
> it can be independent from permutation of tables in a FROM clause.
>
> See the patch in attachment as an POC. Main idea here is to break
> JumbleState down to a 'clocations' part that can be really interested in
> a post parse hook and a 'context data', that needed to build query or
> subquery signature (hash) and, I guess, isn't really needed in any
> extensions.

There have been quite a lot of threads about that in the past, and
almost every time people wanted to change how the hash was computed.
So it seems to me that extensions would actually be quite interested
in that. This is even more the case now that an extension can be used
to replace the queryid calculation only and keep the rest of the
extension relying on it as is.

> I think, it adds not much complexity and overhead.

I think the biggest change in your patch is:

case RTE_RELATION:
- APP_JUMB(rte->relid);
- JumbleExpr(jstate, (Node *) rte->tablesample);
+ {
+ char *relname = regclassout_ext(rte->relid, true);
+
+ APP_JUMB_STRING(relname);
+ JumbleExpr(jstate, (Node *) rte->tablesample, ctx);
APP_JUMB(rte->inh);
break;

Have you done any benchmark on OLTP workload? Adding catalog access
there is likely to add significant overhead.

Also, why only using the fully qualified relation name for stable
hashes? At least operators and functions should also be treated the
same way. If you do that you will probably have way too much overhead
to be usable in a busy production environment. Why not using the new
possibility of 3rd party extension for the queryid calculation that
exactly suits your need?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-10-12 08:37:21 [RFC] building postgres with meson
Previous Message Kyotaro Horiguchi 2021-10-12 08:33:42 Re: Inconsistency in startup process's MyBackendId and procsignal array registration with ProcSignalInit()