Re: [HACKERS] Breakage with VACUUM ANALYSE + partitions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Breakage with VACUUM ANALYSE + partitions
Date: 2016-04-11 17:17:42
Message-ID: 20160411171742.n5xebjt7tcdy4sxf@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 2016-04-11 13:04:48 -0400, Robert Haas wrote:
> You're right, but I think that's more because I didn't say it
> correctly than because you haven't done something novel.

Could be.

> DROP and
> relation truncation know about shared buffers, and they go clear
> blocks that that might be affected from it as part of the truncate
> operation, which means that no other backend will see them after they
> are gone. The lock makes sure that no other references can be added
> while we're busy removing any that are already there. So I think that
> there is currently an invariant that any block we are attempting to
> access should actually still exist.

Note that we're not actually accessing any blocks, we're just opening a
segment to get the associated file descriptor.

> It sounds like these references are sticking around in backend-private
> memory, which means they are neither protected by locks nor able to be
> cleared out on drop or truncate. I think that's a new thing, and a
> bit scary.

True. But how would you batch flush requests in a sorted manner
otherwise, without re-opening file descriptors otherwise? And that's
prety essential for performance.

I can think of a number of relatively easy ways to address this:
1) Just zap (or issue?) all pending flush requests when getting an
smgrinval/smgrclosenode
2) Do 1), but filter for the closed relnode
3) Actually handle the case of the last open segment not being
RELSEG_SIZE properly in _mdfd_getseg() - mdnblocks() does so.

I'm kind of inclined to do both 3) and 1).

> The possibly-saving grace here, I suppose, is that the references
> we're worried about are just being used to issue hints to the
> operating system.

Indeed.

> So I guess if we sent a hint on a wrong block or
> skip sending a hint altogether because of some failure, no harm done,
> as long as we don't error out.

Which the writeback code is careful not to do; afaics it's just the
"already open segment" issue making problems here.

- Andres

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-04-11 22:49:19 Re: BUG #14082: Unexpected time adjustment for output using "at time zone"
Previous Message Robert Haas 2016-04-11 17:04:48 Re: [HACKERS] Breakage with VACUUM ANALYSE + partitions

Browse pgsql-hackers by date

  From Date Subject
Next Message Shulgin, Oleksandr 2016-04-11 17:25:20 Re: PQsendQuery+PQgetResult+PQsetSingleRowMode limitations and support
Previous Message Karl O. Pinc 2016-04-11 17:15:07 Re: PQsendQuery+PQgetResult+PQsetSingleRowMode limitations and support