Re: WIP: [[Parallel] Shared] Hash

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: WIP: [[Parallel] Shared] Hash
Date: 2017-03-26 00:53:23
Message-ID: CAH2-WzmNBiuRAPdKHJ+C=zxNatRFP9uY3O=Z+zu2okvryPRFeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 23, 2017 at 12:35 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Thanks. I really appreciate your patience with the resource
> management stuff I had failed to think through.

It's a surprisingly difficult problem, that almost requires
prototyping just to explain. No need to apologize. This is the process
by which many hard problems end up being solved.

>> However, I notice that the place that this happens instead,
>> PathNameDelete(), does not repeat the fd.c step of doing a final
>> stat(), and using the stats for a pgstat_report_tempfile(). So, a new
>> pgstat_report_tempfile() call is simply missing. However, the more
>> fundamental issue in my mind is: How can you fix that? Where would it
>> go if you had it?
>
> You're right. I may be missing something here (again), but it does
> seem straightforward to implement because we always delete each file
> that really exists exactly once (and sometimes we also try to delete
> files that don't exist due to imprecise meta-data, but that isn't
> harmful and we know when that turns out to be the case).

ISTM that your patch now shares a quality with parallel tuplesort: You
may now hold files open after an unlink() of the original link/path
that they were opened using. As Robert pointed out when discussing
parallel tuplesort earlier in the week, that comes with the risk,
however small, that the vFD cache will close() the file out from under
us during LRU maintenance, resulting in a subsequent open() (at the
tail-end of the vFD's lifetime) that fails unexpectedly. It's probably
fine to assume that we can sanely close() the file ourselves in fd.c
error paths despite a concurrent unlink(), since we never operate on
the link itself, and there probably isn't much pressure on each
backend's vFD cache. But, is that good enough? I can't say, though I
suspect that this particular risk is one that's best avoided.

I haven't tested out how much of a problem this might be for your
patch, but I do know that resowner.c will call your shared mem segment
callback before closing any backend local vFDs, so I can't imagine how
it could be that this risk doesn't exist.

FWIW, I briefly entertained the idea that we could pin a vFD for just
a moment, ensuring that the real FD could not be close()'d out by
vfdcache LRU maintenance, which would fix this problem for parallel
tuplesort, I suppose. That may not be workable for PHJ, because PHJ
would probably need to hold on to such a "pin" for much longer, owing
to the lack of any explicit "handover" phase.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mithun Cy 2017-03-26 02:10:46 Re: Proposal : For Auto-Prewarm.
Previous Message Joe Conway 2017-03-26 00:33:33 Re: Re: [COMMITTERS] pgsql: Faster expression evaluation and targetlist projection.