Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity

From: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
Date: 2022-06-22 23:05:54
Message-ID: 545ABFC1-D5E9-4048-9737-2A7AB2C29174@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Can you describe how it's kept in sync, and how it makes sure that the property
> is maintained over restart / gc? I don't see any change in the code for the
> 2nd part so I don't see how it could work (and after a quick test it indeed
> doesn't).

There is a routine called qtext_hash_sync which removed all entries from
the qtext_hash and reloads it will all the query ids from pgss_hash.

This routine is called during:

1. gc_qtexts()
2. entry_reset()
3. entry_dealloc(), although this can be moved to the end of entry_alloc() instead.
4. pgss_shmem_startup()

All the points when it's called, an exclusive lock is held.this allows or syncing all
The present queryid's in pgss_hash with qtext_hash.
.

> 2nd part so I don't see how it could work (and after a quick test it indeed
> doesn't).

Can you tell me what test you used to determine it is not in sync?

> Can you share more details on the benchmarks you did? Did you also run
> benchmark on workloads that induce entry eviction, with and without need for
> gc? Eviction in pgss is already very expensive, and making things worse just
> to save a bit of disk space doesn't seem like a good compromise.

Sorry this was poorly explained by me. I went back and did some benchmarks. Attached is
The script and results. But here is a summary:
On a EC2 r5.2xlarge. The benchmark I performed is:
1. create 10k tables
2. create 5 users
3. run a pgbench script that performs per transaction a select on
A randomly chosen table for each of the 5 users.
4. 2 variants of the test executed . 1 variant is with the default pg_stat_statements.max = 5000
and one test with a larger pg_stat_statements.max = 10000.

So 10-15% is not accurate. I originally tested on a less powered machine. For this
Benchmark I see a 6% increase in TPS (732k vs 683k) when we have a larger sized
pg_stat_statements.max is used and less gc/deallocations.
Both tests show a drop in gc/deallocations and a net increase
In tps. Less GC makes sense since the external file has less duplicate SQLs.

##################################
## pg_stat_statements.max = 15000
##################################

## with patch

transaction type: /tmp/wl.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
maximum number of tries: 1
duration: 360 s
number of transactions actually processed: 732604
number of failed transactions: 0 (0.000%)
latency average = 9.828 ms
initial connection time = 33.349 ms
tps = 2035.051541 (without initial connection time)
[ec2-user(at)ip- pg_stat_statements]$
(1 row)

42 gc_qtext calls
3473 deallocations

## no patch

transaction type: /tmp/wl.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
maximum number of tries: 1
duration: 360 s
number of transactions actually processed: 683434
number of failed transactions: 0 (0.000%)
latency average = 10.535 ms
initial connection time = 32.788 ms
tps = 1898.452025 (without initial connection time)

154 garbage collections
3239 deallocations

##################################
## pg_stat_statements.max = 5000
##################################

## with patch

transaction type: /tmp/wl.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
maximum number of tries: 1
duration: 360 s
number of transactions actually processed: 673135
number of failed transactions: 0 (0.000%)
latency average = 10.696 ms
initial connection time = 32.908 ms
tps = 1869.829843 (without initial connection time)

400 garbage collections
12501 deallocations

## no patch

transaction type: /tmp/wl.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
maximum number of tries: 1
duration: 360 s
number of transactions actually processed: 656160
number of failed transactions: 0 (0.000%)
latency average = 10.973 ms
initial connection time = 33.275 ms
tps = 1822.678069 (without initial connection time)

580 garbage collections
12180 deallocations

Thanks

Sami
Amazon Web Services

Attachment Content-Type Size
benchmark.sh application/octet-stream 5.4 KB
results.txt text/plain 2.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-06-22 23:07:10 Re: fix stats_fetch_consistency value in postgresql.conf.sample
Previous Message Greg Stark 2022-06-22 22:35:18 Re: Tightening behaviour for non-immutable behaviour in immutable functions