Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
Date: 2022-07-14 06:51:24
Message-ID: 1960.1657781484@antos.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Shouldn't the patch status be set to "Waiting on Author"?

(I was curious if this is a patch that I can review.)

Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:

> On Wed, Jun 22, 2022 at 11:05:54PM +0000, Imseih (AWS), Sami wrote:
> > > Can you describe how it's kept in sync, and how it makes sure that the property
> > > is maintained over restart / gc? I don't see any change in the code for the
> > > 2nd part so I don't see how it could work (and after a quick test it indeed
> > > doesn't).
> >
> > There is a routine called qtext_hash_sync which removed all entries from
> > the qtext_hash and reloads it will all the query ids from pgss_hash.
> > [...]
> > All the points when it's called, an exclusive lock is held.this allows or syncing all
> > The present queryid's in pgss_hash with qtext_hash.
>
> So your approach is to let the current gc / file loading behavior happen as
> before and construct your mapping hash using the resulting query text / offset
> info. That can't work.
>
> > > 2nd part so I don't see how it could work (and after a quick test it indeed
> > > doesn't).
> >
> > Can you tell me what test you used to determine it is not in sync?
>
> What test did you use to determine it is in sync? Have you checked how the gc/
> file loading actually work?
>
> In my case I just checked the size of the query text file after running some
> script that makes sure that there are the same few queries ran by multiple
> different roles, then:
>
> Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 559B
> pg_ctl restart
> Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 2383B
>
> > > Can you share more details on the benchmarks you did? Did you also run
> > > benchmark on workloads that induce entry eviction, with and without need for
> > > gc? Eviction in pgss is already very expensive, and making things worse just
> > > to save a bit of disk space doesn't seem like a good compromise.
> >
> > Sorry this was poorly explained by me. I went back and did some benchmarks. Attached is
> > The script and results. But here is a summary:
> > On a EC2 r5.2xlarge. The benchmark I performed is:
> > 1. create 10k tables
> > 2. create 5 users
> > 3. run a pgbench script that performs per transaction a select on
> > A randomly chosen table for each of the 5 users.
> > 4. 2 variants of the test executed . 1 variant is with the default pg_stat_statements.max = 5000
> > and one test with a larger pg_stat_statements.max = 10000.
>
> But you wrote:
>
> > ##################################
> > ## pg_stat_statements.max = 15000
> > ##################################
>
> So which one is it?
>
> >
> > So 10-15% is not accurate. I originally tested on a less powered machine. For this
> > Benchmark I see a 6% increase in TPS (732k vs 683k) when we have a larger sized
> > pg_stat_statements.max is used and less gc/deallocations.
> > Both tests show a drop in gc/deallocations and a net increase
> > In tps. Less GC makes sense since the external file has less duplicate SQLs.
>
> On the other hand you're rebuilding the new query_offset hashtable every time
> there's an entry eviction, which seems quite expensive.
>
> Also, as I mentioned you didn't change any of the heuristic for
> need_gc_qtexts(). So if the query texts are indeed deduplicated, doesn't it
> mean that gc will artificially
> be called less often? The wanted target of "50% bloat" will become "50%
> bloat assuming no deduplication is done" and the average query text file size
> will stay the same whether the query texts are deduplicated or not.
>
> I'm wondering the improvements you see due to the patch or simply due to
> artificially calling gc less often? What are the results if instead of using
> vanilla pg_stat_statements you patch it to perform roughly the same number of
> gc as your version does?
>
> Also your benchmark workload is very friendly with your feature, what are the
> results with other workloads? Having the results with query texts that aren't
> artificially long would be interesting for instance, after fixing the problems
> mentioned previously.
>
> Also, you said that if you run that benchmark with a single user you don't see
> any regression. I don't see how rebuilding an extra hashtable in
> entry_dealloc(), so when holding an exclusive lwlock, while not saving any
> other work elsewhere could be free?
>
> Looking at the script, you have:
> echo "log_min_messages=debug1" >> $PGDATA/postgresql.conf; \
>
> Is that really necessary? Couldn't you upgrade the gc message to a higher
> level for your benchmark need, or expose some new counter in
> pg_stat_statements_info maybe? Have you done the benchmark using a debug build
> or normal build?
>
>

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2022-07-14 07:13:10 Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
Previous Message Kyotaro Horiguchi 2022-07-14 06:38:37 Re: i.e. and e.g.