Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, Atsushi Torikoshi <atorik(at)gmail(dot)com>, Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Evgeny Efimkin <efimkin(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Date: 2020-10-12 08:20:05
Message-ID: CAOBaU_bt3-CwkAQLu7kUp-Yy7dyjhTntNG7Q30CssJHjaPYKEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 7, 2020 at 9:53 AM Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> On Wed, Oct 7, 2020 at 10:42:49AM +0900, Michael Paquier wrote:
> > On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote:
> > > I propose moving the pg_stat_statements queryid hash code into the
> > > server (with a version number), and also adding a postgresql.conf
> > > variable that lets you control how detailed the queryid hash is
> > > computed. This addresses the problem of people wanting different hash
> > > methods.
> >
> > In terms of making this part expendable in the future, there could be
> > a point in having an enum here, but are we sure that we will have a
> > need for that in the future? What I get from this discussion is that
> > we want a unique source of truth that users can consume, and that the
> > only source of truth proposed is the PGSS hashing. We may change the
> > way we compute the query ID in the future, for example if it gets
> > expanded to some utility statements, etc. But that would be
> > controlled by the version number in the hash, not the GUC itself.
>
> Oh, if that is true, then I agree let's just go with the version number.

But there are many people that aren't happy with the current hashing
approach. If we're going to move the computation in core, shouldn't
we listen to their complaints and let them pay some probably quite
high overhead to base the hash on name and/or fully qualified name
rather than OID?
For instance people using logical replication to upgrade to a newer
version may want to easily compare query performance on the new
version, or people with multi-tenant databases may want to ignore the
schema name to keep a low number of different queryid.

It would probably still be possible to have a custom queryid hashing
by disabling the core one and computing a new one in a custom
extension, but that seems a bit hackish.

Jumping back on Tom's point that there are judgment calls on what is
examined or not, after a quick look I see at least two possible
problems of ignored clauses:
- WITH TIES clause
- OVERRIDING clause

I personally think that they shouldn't be ignored, but I don't know if
they were only forgotten or ignored on purpose.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-10-12 08:36:48 Re: Wired if-statement in gen_partprune_steps_internal
Previous Message tsunakawa.takay@fujitsu.com 2020-10-12 08:19:05 RE: Transactions involving multiple postgres foreign servers, take 2