| From: | Sami Imseih <samimseih(at)gmail(dot)com> |
|---|---|
| To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Flush some statistics within running transactions |
| Date: | 2026-01-20 19:27:55 |
| Message-ID: | CAA5RZ0s6FkEHFdgKf8J6vueZGwsH+08LvV0YPBXa4Dw_8QgtTw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Thanks for the updates!
> > > I don't think this feature could add a noticeable performance impact, so the tests
> > > have been that simple. Do you think we should worry more?
> >
> > One observation is there's no coordination between ANYTIME and
> > TXN_BOUNDARY flushes. While PGSTAT_MIN_INTERVAL
> > prevents a backend from flushing more than once per second, a backend can
> > still perform both an ANYTIME flush and a TXN_BOUNDARY flush within
> > the same 1-second window. Not saying this will be a real problem in
> > the real-world,
> > but we definitely took measures in the current implementation to avoid
> > this scenario.
>
> Right. I think that the PGSTAT_MIN_INTERVAL throttling was put in place to prevent
> flushing too frequently when the backend has a high commit rate. But here, while
> it's true that we don't follow that rule (means a backend could flush more than one
> time per second), that would be a maximum of 2 times (given that ANYTIME is
> flushing every second). So, I'm not sure that this single extra flush is worth
> worrying about. Plus we'd certainly need an extra GetCurrentTimestamp() call, so
> I'm not sure it's worth it.
Yeah, all PGSTAT_MIN_INTERVAL does is throttle pgstat_flush_pending_entries.
Even in the current state, it does not limit how many kinds are flushed, etc.
I consider the ANYTIME flushes the same as just adding another stats kind.
So, I am not really worried about either.
I have some more comments:
-- v2-0001
#1.
+/* When to call pgstat_report_anytime_stat() again */
+#define PGSTAT_ANYTIME_FLUSH_INTERVAL 1000
+
We should just use PGSTAT_MIN_INTERVAL.
#2.
instead of ".flush_behavior", maybe ".flush_mode"? "mode" in the name is better
for configuration fields.
#3.
+/*
+ * Flush behavior for statistics kinds.
+ */
+typedef enum PgStat_FlushBehavior
+{
+ FLUSH_ANYTIME, /* All fields can be
flushed anytime,
+ *
including within transactions */
+ FLUSH_AT_TXN_BOUNDARY, /* All fields can only be flushed at
+ *
transaction boundary */
+} PgStat_FlushBehavior;
FLUSH_AT_TXN_BOUNDARY should be the first value in PgStat_FlushBehavior.
Otherwise kinds ( built-in or custom ) that do not specify a flush_behavior
will default to FLUSH_ANYTIME. I don't think this is what we want.
FLUSH_AT_TXN_BOUNDARY should be the default.
#4. Can we add a test here? Maybe generate some wal inside a long
running transaction and
make sure the stats are updated after > 1 second
-- v2-0002
No comments for this one. With ANYTIME, indeed those flushes are not needed.
-- v2-0003
#1. Should we maybe make this a bit longer? maybe 2 or 3 seconds?
May make the tests slightly longer, but maybe better for test stability.
```
+step s1_sleep: SELECT pg_sleep(1.5);
+pg_sleep
+--------
```
#2.
+ /*
+ * Check if there are any non-transactional stats to flush. Avoid
+ * unnecessarily locking the entry if nothing accumulated.
+ */
+ if (lstats->counts.numscans > 0 ||
+ lstats->counts.tuples_returned > 0 ||
+ lstats->counts.tuples_fetched > 0 ||
+ lstats->counts.blocks_fetched > 0 ||
+ lstats->counts.blocks_hit > 0)
+ has_nontxn_stats = true;
+
+ if (!has_nontxn_stats)
+ return true;
Can we just do this without a has_nontxn_stats?
This is also the same patter as a regular flush, although
in the case `pg_memory_is_all_zeros` is used.
```
if (lstats->counts.numscans == 0 &&
lstats->counts.tuples_returned == 0 &&
lstats->counts.tuples_fetched == 0 &&
lstats->counts.blocks_fetched == 0 &&
lstats->counts.blocks_hit == 0)
return true;
```
#3.
+ are updated while the transactions are in progress. This means
that we can see
+ those statistics being updated without having to wait until the transaction
+ finishes.
+ </para>
The "This means ...... " line used several times does not add value, IMO.
"are updated while the transactions are in progress." is sufficient.
#4.
+ <note>
+ <para>
+ All the statistics are updated while the transactions are in
progress, except
+ for <structfield>xact_commit</structfield>,
<structfield>xact_rollback</structfield>,
+ <structfield>tup_inserted</structfield>,
<structfield>tup_updated</structfield> and
+ <structfield>tup_deleted</structfield> that are updated only when
the transactions
+ finish.
+ </para>
+ </note>
Only these 5 fields from pgstat_relation_flush_anytime_cb, so only the below are
"All the statistics are updated while the transactions are in progress", right?
numscans
tuples_returned
tuples_fetched
blocks_fetched
blocks_hit
--
Sami Imseih
Amazon Web Services (AWS)
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2026-01-20 19:31:23 | Re: Speed up transaction completion faster after many relations are accessed in a transaction |
| Previous Message | Tom Lane | 2026-01-20 19:26:58 | Re: Mystery with REVOKE PRIVILEGE |