Quick Links

Re: compute_query_id and pg_stat_statements

From:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To:	rjuju123(at)gmail(dot)com
Cc:	magnus(at)hagander(dot)net, masao(dot)fujii(at)oss(dot)nttdata(dot)com, michael(at)paquier(dot)xyz, andres(at)anarazel(dot)de, tgl(at)sss(dot)pgh(dot)pa(dot)us, alvherre(at)alvh(dot)no-ip(dot)org, sfrost(at)snowman(dot)net, bruce(at)momjian(dot)us, myon(at)debian(dot)org, peter(dot)eisentraut(at)enterprisedb(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: compute_query_id and pg_stat_statements
Date:	2021-05-12 09:37:24
Message-ID:	20210512.183724.859740020819002744.horikyota.ntt@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

At Wed, 12 May 2021 14:05:16 +0800, Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote in
> On Wed, May 12, 2021 at 02:33:35PM +0900, Kyotaro Horiguchi wrote:
> > At Wed, 12 May 2021 10:42:01 +0800, Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote in
> > >
> > > I don't think that this approach would cope well for people who want a queryid
> > > without pg_stat_statements or such. Since the queryid can now be found in
> > > pg_stat_activity, EXPLAIN output or the logs I think it's entirely reasonable
> > > to allow users to benefit from that even if they don't install additional
> > > module.
> >
> > Ah, I missed that case. And we are wanting to use pg_stat_statements
> > with (almost) zero-config? How about the following behavior?
> >
> > Setting query_id_provider to 'none' means we don't calculate query-id
> > by default. However, if queryIdWante() is called, the default provider
> > is set up and starts calculating query id.
>
> Having "none" meant "not unless someone asks for it" looks like a POLA
> violation.

Sorry for confusion. A different behavior for "none" is proposed later
in the mail. It is just an intermediate discussion.

> > Setting query_id_provider to something else means the user wants
> > query-id calcualted using the provider. Setting 'default' is
> > equivalent to setting compute_query_id to 'on'.
> >
> > There might be a case where a user sets query_id_provider to
> > non-'none' but don't want have query-id calculated, but it can be said
> > a kind of mis-configuration?
>
> So if I'm understanding correctly, you're arguing for an approach different to
> what Michael stated as the general consensus in [1]. I'm not saying that I
> think it's a bad idea (and I actually suggested it before), but we have to
> chose an approach and stick with it.

I'm not sure how much room for change of the direction is left. So it
was just a proposal. So if the majority still thinks that it is the
way to stick to controling on/off/(auto) the in-core implement and
separately allow another module to be hooked, I don't further object
to that decision.

> > > I think this would be a mistake to do that, as it would mean that we don't
> > > officially support alternative queryid provider.
> >
> > Ok, if we want to support alternative providers from the first, we
> > need to actually write the loader code for query-id providers. It
> > would not be so hard?, but it might not be suitable to this stage so I
> > proposed that to get rid of needing such complexity for now.
>
> I did write a POC extension [2] to demonstrate that moving pg_stat_statement's
> queryid calculation in core doesn't mean that we're imposing it to everyone.
> And yes this is critical and a must have in the initial implementation.

Ok, understood.

> > (Anyway I prefer to load query-id provider as a dynamically loadable
> > module rather than hook-function.)
>
> I agree that having a specific API (I'm fine with a hook or a dynamically
> loaded function) for that would be better, but it doesn't appear to be the
> opinion of the majority.

Ugg. Ok.

> > > The GUC itself may not change, but third-party queryid provider would probably
> > > need changes as the new entry point will be dedicated to compute a queryid
> > > only, while third-party plugins may do more than that in their
> > > post_parse_analyze_hook. And also users will have to change their
> >
> > I don't think it is not that a problem.
>
> Did you mean "I don't think that it's a problem"? Otherwise I don't get it.

Yes, you're right. Sorry for the typo.

> > Even if any third-party
> > extension is having query-id generator by itself, in most cases it
> > would be a copy of JumbleQuery in case of pg_stat_statement is not
> > loaded and now it is moved in-core as 'default' provider. What the
> > exntension needs to be done is just ripping out the copied generator
> > code. I guess...
>
> I don't fully understand, but it seems that you're arguing that the only use
> case is to have something similar to pg_stat_statements (say e.g.
> pg_store_plans), that always have the same queryid implementation as
> pg_stat_statements. That's not the case, as there already are "clones" of
> pg_stat_statements, and the main difference is an alternative queryid
> implementation. So in that case what author would do is to drop everything
> *except* the queryid implementation.
>
> And if I'm not mistaken, pg_store_plans also wants a different queryid
> implementation, but has to handle a secondary queryid on top of it
> (https://github.com/ossc-db/pg_store_plans/blob/master/pg_store_plans.c#L843-L855)

Yeah, the extension intended to be used joining with the
pg_stat_statements view. And the reason for the second query-id dates
back to the era when query id was not available in the
pg_stat_statements view. Now it is mere a fall-back query id when
pg_stat_statments is not active. Now that the in-core query-id is
available, I think there's no reason to keep that implement.

> So here again what the extension want is to get rid of pg_stat_statements (and
> now core) queryid implementation.

So the extension might be a good reason for the discussion^^;

> > > configuration to use that new interface, and additionally the module may now
> > > have to be removed from shared_preload_libraries. Overall, it doesn't seem to
> > > me that it would make users' life easier.
> >
> > Why the third-party module need to be removed from
> > shared_preload_libraries? The module can stay as a preloaded shared
> > library but just no longer need to have its own query-id provider
> > since it is provided in-core. If the extension required a specific
> > provider, the developer need to make it a loadable module and users
> > need to specify the provider module explicitly.
>
> It's the same misunderstanding here. Basically people want to benefit from the
> whole ecosystem based on a queryid (pg_stat_statements, now
> pg_stat_activity.query_id and such) but with another definition of what a
> queryid is. So those people will now only need to implement something like
> [2], rather than forking every single extension they want to use.

Hmm. I'm not sure the [2] gives sufficient reason for leaving the
current interface. But will follow if it is sitll the consensus. (And
it seems like true.)

> [1]: https://www.postgresql.org/message-id/YJoeXcrwe1EDmqKT@paquier.xyz
> [2]: https://github.com/rjuju/pg_queryid

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Re: compute_query_id and pg_stat_statements at 2021-05-12 06:05:16 from Julien Rouhaud

Responses

Re: compute_query_id and pg_stat_statements at 2021-05-12 10:09:30 from Julien Rouhaud

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kyotaro Horiguchi	2021-05-12 09:39:27	Re: compute_query_id and pg_stat_statements
Previous Message	Peter Smith	2021-05-12 09:35:19	Re: GetSubscriptionRelations declares too many scan keys