Re: [RFC] Should we fix postmaster to avoid slow shutdown?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Should we fix postmaster to avoid slow shutdown?
Date: 2016-11-22 15:59:45
Message-ID: 21221.1479830385@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> writes:
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
>> The point I was trying to make is that I think the forced-removal behavior
>> is not desirable, and therefore committing a patch that makes it be graven
>> in stone is not desirable either.

> I totally agree that we should pursue the direction for escaping from the complete loss of stats files. Personally, I would like to combine that with the idea of persistent performance diagnosis information for long-term analysis (IIRC, someone proposed it.) However, I don't think my patch will make everyone forget about the problem of stats file loss during recovery. The problem exists with or without my patch, and my patch doesn't have the power to delute the importance of the problem. If you are worried about memory, we can add an entry for the problem in TODO list that Bruce-san is maintaining.

> Or, maybe we can just stop removing the stats files during recovery by keeping the files of previous generation and using it as the current one. I haven't seen how fresh the previous generation is (500ms ago?). A bit older might be better than nothing.

Freshness isn't the issue. The stats file isn't there at all, in the
permanent stats directory, unless the collector takes the time to write
it before exiting. Without that, we have unrecoverable loss of the stats
data. Now, that isn't as bad as loss of the SQL data content, but it's
not good either.

It's already the case that the pgstats code writes the stats data under a
temporary file name and then renames it into place atomically. So the
prospects for corrupt data are not large, and I do not think that the
existing removal behavior was intended to prevent that. Rather, the
concern was that if you do a point-in-time recovery to someplace much
earlier on the WAL timeline, the stats file will be out of sync with
what's now in your database. That's a valid point, but deleting the
stats file during *any* recovery seems like an overreaction.

The simplest solution I can think of is to delete the stats file when
doing a PITR operation, but not during simple crash recovery. I've
not looked to see how hard it would be to do that, but it seems like
it should be a fairly minor logic tweak. Maybe decide to do the removal
at the point where we intentionally stop following WAL someplace earlier
than its end.

Another angle we might take, independently of that, is to delete the
stats file if the stats collector process itself crashes. This would
provide a recovery avenue if somehow we did have a stats file that
was corrupt enough to crash the collector. And it would not matter
for post-startup crashes of the stats collector, because the file
would not be there anyway.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-11-22 16:18:22 Re: [HACKERS] switching documentation build to XSLT
Previous Message Ashutosh Bapat 2016-11-22 15:30:30 Re: Push down more full joins in postgres_fdw