Re: More efficient truncation of pg_stat_activity query strings

From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: More efficient truncation of pg_stat_activity query strings
Date: 2017-09-15 12:13:35
Message-ID: CAGz5QCKxZLWbs-14-FTdRno=mK44_R938CAZnN_5MZjN07Nt_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 14, 2017 at 11:33 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2017-09-12 00:19:48 -0700, Andres Freund wrote:
>> Hi,
>>
>> I've recently seen a benchmark in which pg_mbcliplen() showed up
>> prominently. Which it will basically in any benchmark with longer query
>> strings, but fast queries. That's not that uncommon.
>>
>> I wonder if we could avoid the cost of pg_mbcliplen() from within
>> pgstat_report_activity(), by moving some of the cost to the read
>> side. pgstat values are obviously read far less frequently in nearly all
>> cases that are performance relevant.
>>
>> Therefore I wonder if we couldn't just store a querystring that's
>> essentially just a memcpy()ed prefix, and do a pg_mbcliplen() on the
>> read side. I think that should work because all *server side* encodings
>> store character lengths in the *first* byte of a multibyte character
>> (at least one clientside encoding, gb18030, doesn't behave that way).
>>
>> That'd necessitate an added memory copy in pg_stat_get_activity(), but
>> that seems fairly harmless.
>>
>> Faults in my thinking?
>
> Here's a patch that implements that idea. Seems to work well. I'm a
> bit loathe to add proper regression tests for this, seems awfully
> dependent on specific track_activity_query_size settings. I did confirm
> using gdb that I see incomplete characters before
> pgstat_clip_activity(), but not after.
>
> I've renamed st_activity to st_activity_raw to increase the likelihood
> that potential external users of st_activity notice and adapt. Increases
> the noise, but imo to a very bareable amount. Don't feel strongly
> though.
>
Hello,

The patch looks good to me. I've done some regression testing with a
custom script on my local system. The script contains the following
statement:
SELECT 'aaa..<repeated 600 times>' as col;

Test 1
-----------------------------------
duration: 300 seconds
clients/threads: 1

On HEAD TPS: 13181
+ 9.30% 0.27% postgres postgres [.] pgstat_report_activity
+ 8.88% 0.02% postgres postgres [.] pg_mbcliplen
+ 7.76% 4.77% postgres postgres [.] pg_encoding_mbcliplen
+ 4.06% 4.06% postgres postgres [.] pg_utf_mblen

With the patch TPS:13628 (+3.39%)
+ 0.36% 0.21% postgres postgres [.] pgstat_report_activity

Test 2
-----------------------------------
duration: 300 seconds
clients/threads: 8

On HEAD TPS: 53084
+ 12.17% 0.20% postgres postgres [.]
pgstat_report_activity
+ 11.83% 0.02% postgres postgres [.] pg_mbcliplen
+ 11.19% 8.03% postgres postgres [.] pg_encoding_mbcliplen
+ 3.74% 3.73% postgres postgres [.] pg_utf_mblen

With the patch TPS: 63949 (+20.4%)
+ 0.40% 0.25% postgres postgres [.] pgstat_report_activity

This shows the significance of this patch in terms of performance
improvement of pgstat_report_activity. Is there any other tests I
should do for the same?

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-09-15 12:20:00 Re: no test coverage for ALTER FOREIGN DATA WRAPPER name HANDLER ...
Previous Message Ashutosh Bapat 2017-09-15 11:59:21 Re: Partition-wise join for join between (declaratively) partitioned tables