From: | Shayon Mukherjee <shayonj(at)gmail(dot)com> |
---|---|
To: | Sami Imseih <samimseih(at)gmail(dot)com> |
Cc: | Robert Treat <rob(at)xzilla(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Prolonged truncation phase during vacuum on toast table with repeated interruptions by lock waiters and a proposed POC patch |
Date: | 2025-05-08 20:58:35 |
Message-ID: | CANqtF-q7x-aw84OtiTZ-VkLuafmutzgr-Lk3LeeiKs+zFm2Mng@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thank you for engaging into the conversation and sharing your thoughts Sami
and Robert.
On Thu, May 8, 2025 at 2:57 PM Sami Imseih <samimseih(at)gmail(dot)com> wrote:
> > actually determine you are in this situation, or how bad the situation
> > was, in order to know that setting truncate off would help? To that
>
>
That's definitely a sound idea and the way I was able to replicate and
learn myself was by doing a PATCH that would just print the # of attempts
it was doing. That said, just to add more color - I was coming at this from
a point of view where truncation would be on a "best effort" basis and the
trade off here would be less availability risk at the cost of missed
truncations. My first instinct was that, having vacuum_truncate set to ON
by default on busy systems that have a lot of churn can catch a lot of
developers and PostgreSQL by surprise (sure did to me :D), so the best
effort and "fail fast" behavior could help, however at the same time I
agree that it doesn't solve either of the problems really well. Another
idea I had around this section was to expose an attribute/guc/setting like
VACUUM_TRUNCATE_INTERRUPTION_MAX_RETRIES with a default of `1` or
something, that users could optionally configure, perhaps that could be a
middleground, but I am myself not fully sold on the idea either (?).
> Adding counters for this area is not a bad idea in general, as this hits
> customers particularly hard on hot standbys when the truncate does actually
> occur on the primary.
>
> What about adding cumulative counters ( per table and pg_stat_database
> ) such as:
>
> pages_vac_truncated - # of pages truncated by vacuum
> vac_truncate_conflicts - # of time truncate was skipped due to conflict
> vac_truncate_suspended - # of times the truncate was suspended.
>
> The difference between conflict and suspended is conflict causes vacuum
> to skip the truncate phase while suspended causes vacuum to retry the
> phase continually.
>
Would vac_truncate_retries or something similar be more immediately clear?
If so, maybe something like
- vac_truncate_interruptions - # of time truncate was skipped due to
conflict and not retried
- vac_truncate_retries - # of times the truncate was suspended and retried
> The argument against adding these counters is that the views,
> pg_stat_all_tables/
> pg_stat_all_database are becoming super-wide, so maybe we need to think
> about inventing a new view for vacuum related counter metrics. This seems
> like
> a good discussion for v19.
>
>
+1 for this. I'd even be happy to even just start showing the attempts as
part of INFO or VERBOSE on conflicts & suspensions, without keeping track,
but it is useful statistical data.
Thanks
Shayon
From | Date | Subject | |
---|---|---|---|
Previous Message | Sami Imseih | 2025-05-08 20:50:32 | Re: queryId constant squashing does not support prepared statements |