| From: | Sami Imseih <samimseih(at)gmail(dot)com> |
|---|---|
| To: | Michael Paquier <michael(at)paquier(dot)xyz> |
| Cc: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> |
| Subject: | Re: Flush some statistics within running transactions |
| Date: | 2026-03-16 23:42:44 |
| Message-ID: | CAA5RZ0ssPraeF9gg8S8G7O25t90t8_Y2PU-jdNkZ42EiNH22CA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
I have spent some time thinking about Michael's feedback and I want to
see if I can propose an alternate design that we can come to a consensus
on.
> I had to work around the assert at the end of pgstat_report_stat(), to
> tell that under an xact the partial flushes were OK. I was wondering
> about keeping the end assertion based on solely "force" intact, making
> the flush routines return a status based on if we are in an xact.
>
> > That looks "simpler" that the previous proposal but who would be responsible to
> > call pg_stat_report()? If that's the client responsibility that kind of look weird
> > to me. If that's the core, how would that be scheduled? I think that the
> > end solution should prevent to find similar issues as 039549d70f6 fixed, without
> > delegating to the client.
>
> TBH, I don't like the requirement of setting SIGALRM in all the places
> where we'd require it for the sake of this proposal, where
> historically we have never done that, copying a mechanism that already
> exists in the tree for procsigs, as the previous patch I posted proves
> we could reuse.
How about we add a timeout when a transaction goes idle?
Currently we have IDLE_STATS_UPDATE_TIMEOUT, which is the
timeout interval used when we have done more than one flush
within a second. This only works when the session is idle
(not in a transaction), and flushes everything.
However, as I mentioned earlier in this thread, we can also schedule
a flush that occurs when a transaction goes idle. This means when
a transaction is started (BEGIN/START TRANSACTION) the proceeding
idle-in-transaction can schedule a timeout of 10 seconds based on
a new #define called PGSTAT_IDLE_TXN_INTERVAL.
The timeout is not a fixed repeating interval. It is set
on-demand each time the transaction enters idle-in-transaction
state: first after the BEGIN, then again after each subsequent
command completes. This is a trade-off: if a single command
runs for a long time, no flush happens until it finishes and
the transaction goes idle again.
So, we still introduce a new timeout in this mechanism, but
it does not need to be registered all over the place.
So with this all the "safe to be flushed" stats will be flushed
mid-transaction at some point, and all stats will be flushed
at the end of a transaction.
With that said, the only way I can conceive getting away from a
a new timeout in this design is to track timestamp whenever
we go into idle-intransaction, but I am not too comfy adding a
GetCurrentTimestamp() there.
-> if no action occurs at this point, we don't set another timeout
> It's also not clear to me what a correct frequency of
> the stat updates should be, and why it would make sense to force that
> in a GUC;
No GUC will be needed in what I am proposing. A 10s flush interval
should be sufficient.
> On top of that, I am not really convinced that there is a good reason
> to remove the existing stat report calls we have already planted in
> the tree for auxiliary processes, diverging from the stable branches
> where these exist.
The design being proposed keeps all the existing in-tact.
So attached is a new proposal with tests and docs. In terms of
test I fell back to the strategy used by Bertrand [0] with the
INJECTION_POINT trick to reduce the flush interval. It does mean
we keep a pg_sleep in the test, but I think we should not try
to do anything different here.
Also, with this design we can call the flush modes FLUSH_IN_TRANSACTION
and FLUSH_AT_TXN_BOUNDARY. As the earlier proposals, the
callbacks will still need to make the decision what stats to flush based on
if it's called in-transaction or at boundary.
[0] https://www.postgresql.org/message-id/aZbDYMrOkeCyIubO%40ip-10-97-1-34.eu-west-3.compute.internal
--
Sami Imseih
Amazon Web Services (AWS)
| Attachment | Content-Type | Size |
|---|---|---|
| v12-0001-Add-periodic-in-transaction-stats-flushing.patch | application/octet-stream | 54.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tomas Vondra | 2026-03-16 23:50:40 | Re: Stack-based tracking of per-node WAL/buffer usage |
| Previous Message | Michael Paquier | 2026-03-16 23:37:39 | Re: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE |