Re: Remove_temp_files_after_crash and significant recovery/startup time

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "McCoy, Shawn" <shamccoy(at)amazon(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Remove_temp_files_after_crash and significant recovery/startup time
Date: 2021-09-10 22:10:00
Message-ID: 184df50e-f87f-4427-9ea4-431f4c752b40@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 10, 2021, at 5:58 PM, McCoy, Shawn wrote:
> I noticed that the new parameter remove_temp_files_after_crash is currently set to a default value of "true" in the version 14 release. It seems this was discussed in this thread [1], and it doesn't look to me like there's been a lot of stress testing of this feature.
>
> In our fleet there have been cases where we have seen hundreds of thousands of temp files generated. I found a case where we helped a customer that had a little over 2.2 million temp files. Single threaded cleanup of these takes a significant amount of time and delays recovery. In RDS, we mitigated this by moving the pgsql_tmp directory aside, start the engine and then separately remove the old temp files.
2.2 million temporary files? I'm wondering in what circumstances your system is
generating those temporary files. Low work_mem and thousands of connections?
Low work_mem and a huge analytic query? When I designed this feature I thought
about some extreme cases, that's why this behavior is controlled by a GUC. We
can probably add a sentence that explains the recovery delay caused by dozens
of thousands of temporary files.

>
> After noticing the current plans to default this GUC to "on" in v14, just thought I'd raise the question of whether this should get a little more discussion or testing with higher numbers of temp files?
>
Crash a backend is per se a rare condition (at least it should be). Crash while
having millions of temporary files in your PGDATA is an even rarer condition. I
saw several cases related to this issue and none of them generates millions of
temporary files (at most a thousand files). IMO the benefits outweigh the
issues as I explained in [1]. Service continuity (for the vast majority of
cases) justifies turning it on by default.

If your Postgres instance is generating millions of temporary files, it seems
your setup needs some tuning.

[1] https://www.postgresql.org/message-id/CAH503wDKdYzyq7U-QJqGn%3DGm6XmoK%2B6_6xTJ-Yn5WSvoHLY1Ww%40mail.gmail.com

--
Euler Taveira
EDB https://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2021-09-10 22:16:28 Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Previous Message Tomas Vondra 2021-09-10 21:57:24 Re: Remove_temp_files_after_crash and significant recovery/startup time