Quick Links

Re: Prolonged truncation phase during vacuum on toast table with repeated interruptions by lock waiters and a proposed POC patch

From:	Shayon Mukherjee <shayonj(at)gmail(dot)com>
To:	Sami Imseih <samimseih(at)gmail(dot)com>
Cc:	Robert Treat <rob(at)xzilla(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Prolonged truncation phase during vacuum on toast table with repeated interruptions by lock waiters and a proposed POC patch
Date:	2025-05-08 20:58:35
Message-ID:	CANqtF-q7x-aw84OtiTZ-VkLuafmutzgr-Lk3LeeiKs+zFm2Mng@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thank you for engaging into the conversation and sharing your thoughts Sami
and Robert.

On Thu, May 8, 2025 at 2:57 PM Sami Imseih <samimseih(at)gmail(dot)com> wrote:

> > actually determine you are in this situation, or how bad the situation
> > was, in order to know that setting truncate off would help? To that
>
>
That's definitely a sound idea and the way I was able to replicate and
learn myself was by doing a PATCH that would just print the # of attempts
it was doing. That said, just to add more color - I was coming at this from
a point of view where truncation would be on a "best effort" basis and the
trade off here would be less availability risk at the cost of missed
truncations. My first instinct was that, having vacuum_truncate set to ON
by default on busy systems that have a lot of churn can catch a lot of
developers and PostgreSQL by surprise (sure did to me :D), so the best
effort and "fail fast" behavior could help, however at the same time I
agree that it doesn't solve either of the problems really well. Another
idea I had around this section was to expose an attribute/guc/setting like
VACUUM_TRUNCATE_INTERRUPTION_MAX_RETRIES with a default of `1` or
something, that users could optionally configure, perhaps that could be a
middleground, but I am myself not fully sold on the idea either (?).

> Adding counters for this area is not a bad idea in general, as this hits
> customers particularly hard on hot standbys when the truncate does actually
> occur on the primary.
>
> What about adding cumulative counters ( per table and pg_stat_database
> ) such as:
>
> pages_vac_truncated - # of pages truncated by vacuum
> vac_truncate_conflicts - # of time truncate was skipped due to conflict
> vac_truncate_suspended - # of times the truncate was suspended.
>
> The difference between conflict and suspended is conflict causes vacuum
> to skip the truncate phase while suspended causes vacuum to retry the
> phase continually.
>

Would vac_truncate_retries or something similar be more immediately clear?
If so, maybe something like

- vac_truncate_interruptions - # of time truncate was skipped due to
conflict and not retried
- vac_truncate_retries - # of times the truncate was suspended and retried

> The argument against adding these counters is that the views,
> pg_stat_all_tables/
> pg_stat_all_database are becoming super-wide, so maybe we need to think
> about inventing a new view for vacuum related counter metrics. This seems
> like
> a good discussion for v19.
>
>
+1 for this. I'd even be happy to even just start showing the attempts as
part of INFO or VERBOSE on conflicts & suspensions, without keeping track,
but it is useful statistical data.

Thanks
Shayon

In response to

Re: Prolonged truncation phase during vacuum on toast table with repeated interruptions by lock waiters and a proposed POC patch at 2025-05-08 18:56:55 from Sami Imseih

Responses

Re: Prolonged truncation phase during vacuum on toast table with repeated interruptions by lock waiters and a proposed POC patch at 2025-08-09 14:48:56 from Shayon Mukherjee

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jacob Champion	2025-05-08 22:06:34	Re: [PATCH] Fix missing comma in Requires.private with a Make macro
Previous Message	Sami Imseih	2025-05-08 20:50:32	Re: queryId constant squashing does not support prepared statements