Re: limiting collected query text length in pg_stat_statements

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: "Godfrin, Philippe E" <Philippe(dot)Godfrin(at)nov(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: limiting collected query text length in pg_stat_statements
Date: 2022-05-02 13:57:45
Message-ID: 20220502135745.ms2omopxqgs5p56p@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, May 02, 2022 at 12:45:28PM +0000, Godfrin, Philippe E wrote:
> Greetings,
>
> I want to limit the query text that gets captured in pg_stat_statements. We
> have sql statements with thousands of values clauses (upwards of 10,000) that
> run at a 1 second interval. When just a handful are running plus 2 or 3 loads
> using the same technique (10,000 entry values clauses) querying the
> pg_stat_statements table gets bogged down (see below). With the
> pg_stat_statements.max is set to 1000 statements just querying the table
> stats table seems to impact the running statements! I have temporarily staved
> off the issue by reducing the max to 250 statements, and I have made
> recommendations to the development team to cut down the number of values
> clauses. However, it seems to me that the ability to truncate the captured
> query would be a useful feature.

The store queries are normalized so the values themselves won't be stored, only
a "?" per value. And as long as all the queries have the same number of values
there should be a single entry stored for the same role and database, so all in
all it should limit the size of the stored query texts.

On the other hand, with such a low pg_stat_statements.max, you may have a lot
of entry evictions, which tends to bloat the external query file
($PGDATA/pg_stat_tmp/pgss_query_texts.stat). Did you check how big it is and
if yes how fast it grows? I've once seen the file being more than 1GB without
any reason why, which was obviously slowing everything down. A simple call to
pg_stat_statements_reset() fixed the problem, at least as far as I know as I
never had access to the server and never had any news after that.

> I've peeked at the source code and I don't see the track_activity_query_size
> used (pg_stat_activity.query) which would be one mechanism. I don't really
> know what would be the right way to do this or even if it is a good idea,
> i.e. if limiting that would have a larger impact to the statistics
> ecosystem...

pg_stat_statements used to truncate the query text to
track_activity_query_size, but that limitation was removed when the query texts
were moved to the external query file. It's quite convenient to have the full
normalized query text available, especially with the application is using some
ORM, as they tend to make SQL even more verbose than it already is. Having a
very high number of values (and I'm assuming queries with different number of
values) seems like a corner case, but truncating the query
text would only fix part of the problem. It will lead to a very high number of
different queryid, which is also problematic as frequent entry evictions also
tends to have an insanely high overhead, and you can't and an infinite number
of entries stored.

> Thoughts or suggestions?

You didn't explain how you're using pg_stat_statements. Do you really need to
query pg_stat_statements with the query text each time? If you only need to
get some performance metrics you could adapt your system to only retrieve the
query text for the wanted queryid(s) once you find some problematic pattern,
and/or cache the query texts a table or some other place. But with a very low
pg_stat_statements.max (especially if you can have a varying number of values
from 1 to 10k) it might be hard to do.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2022-05-02 14:02:18 Re: strange slow query - lost lot of time somewhere
Previous Message Bharath Rupireddy 2022-05-02 13:54:04 Re: Unfiltered server logs routing via a new elog hook or existing emit_log_hook bypassing log_min_message check