|From:||Julien Rouhaud <rjuju123(at)gmail(dot)com>|
|To:||Magnus Hagander <magnus(at)hagander(dot)net>|
|Cc:||Sergei Kornilov <sk(at)zsrv(dot)org>, legrand legrand <legrand_legrand(at)hotmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: pg_stat_statements oddity with track = all|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On Mon, Mar 08, 2021 at 06:03:59PM +0100, Magnus Hagander wrote:
> On Sun, Mar 7, 2021 at 8:39 AM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> > Yes I was a bit worried about the possible extra text entry. I kept things
> > simple until now as the general opinion was that entries existing for both top
> > level and nested level should be very rare so adding extra code and overhead to
> > spare a few query texts wouldn't be worth it.
> > I think that using a flag might be a bit expensive, as we would have to make
> > sure that at least one of the possible two entries has it. So if there are 2
> > entries and the one with the flag is evicted, we would have to transfer the
> > flag to the other one, and check the existence of the flag when allocatin a new
> > entry. And all of that has to be done holding an exclusive lock on pgss->lock.
> Yeah, we'd certainly want to minimize things. But what if they both
> have the flag at that point? Then we wouldn't have to care on
> eviction? But yes, for new allications we'd have to look up if the
> query existed with the other value of the flag, and copy it over in
> that case.
I think that we might be able to handle that without a flag. The only thing
that would need to be done is when creating an entry, look for an existing
entry with the opposite flag, and if there's simply use the same
(query_offset, query_len) info. This doesn't sound that expensive.
The real pain point will be that the garbage collection phase
will become way more expensive as it will now have to somehow maintain that
knowledge, which will require additional lookups for each entry. I'm a bit
concerned about that, especially with the current heuristic to schedule garbage
collection. For now, need_qc_qtext says that we have to do it if the extent is
more than 512 (B) * pgss_max. This probably doesn't work well for people using
ORM as they tend to generate gigantic SQL queries.
If we implement query text deduplication, should we add another GUC for that
"512" magic value so that people can minimize the gc overhead if they know they
have gigantic queries, or simply don't mind bigger qtext file?
> > Maybe having a new hash table (without the toplevel flag) for those query text
> > might be better, or maybe pgss performance is already so terrible when you have
> > to regularly evict entries that it wouldn't make any real difference.
> > Should I try to add some extra code to make sure that we only store the query
> > text once, or should I document that there might be duplicate, but we expect
> > that to be very rare?
> If we expect it to be rare, I think it might be reasonable to just
> document that. But do we really have a strong argument for it being
I don't that think that anyone really had a strong argument, mostly gut
feeling. Note that pg_stat_kcache already implemented that toplevel flags, so
if people are using that extension in a recent version they might have some
figures to show. I'll ping some people that I know are using it.
One good argument would be that gigantic queries generated by ORM should always
be executed as top level statements.
I previously tried with the postgres regression tests, which clearly isn't a
representative workload, and as far as I can see the vast majority of queries
executed bost as top level and nested level are DDL implying recursion (e.g. a
CREATE TABLE with underlying index creation).
|Next Message||Amit Langote||2021-03-09 02:53:59||Re: TRUNCATE on foreign table|
|Previous Message||Michael Paquier||2021-03-09 02:19:51||Re: Disallow SSL compression?|