Re: [HACKERS] Parallel Hash take II

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: [HACKERS] Parallel Hash take II
Date: 2017-12-07 23:07:04
Message-ID: CAEepm=1fWpa0ZcHDB6JH-gBtH8EqSdAqJOMtb5uao9Rvp3Gp-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Sat, Dec 2, 2017 at 4:46 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Sat, Dec 2, 2017 at 3:54 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On Sat, Dec 2, 2017 at 1:55 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>>> - As we don't clean temp files after crash-restarts it isn't impossible
>>> to have a bunch of crash-restarts and end up with pids *and* per-pid
>>> shared file set counters reused. Which'd lead to conflicts. Do we care?
>>
>> We don't care. PathNameCreateTemporaryDir() on a path that already
>> exists will return silently. PathNameCreateTemporaryFile() on a path
>> that already exists, as mentioned in a comment and following an
>> existing convention, will open and truncate it. So either it was
>> really an orphan and that is a continuation of our traditional
>> recycling behaviour, or the calling code has a bug and used the same
>> path twice and it's not going to end well.
>
> On further reflection, there are problems with that higher up. (1)
> Even though PathNameCreateTemporaryFile() will happily truncate and
> reuse an orphaned file when BufFileCreateShared() calls it,
> BufFileOpenShared() could get confused by the orphaned files. It
> could believe that XXX.1 is a continuation of XXX.0, when in fact it
> is junk left over from a crash restart. Perhaps BufFileCreateShared()
> needs to delete XXX.{N+1} if it exists, whenever it creates XXX.{N}.
> That will create a gap in the series of existing files that will cause
> BufFileOpenShared()'s search to terminate. (2) BufFileOpenShared()
> thinks that the absence of an XXX.0 file means there is no BufFile by
> this name, when it could mistakenly open pre-crash junk due to a
> colliding name. I use that condition as information, but I think I
> can fix that easily by using SharedTuplestoreParticipant::npage == 0
> to detect that there should be no file, rather than trying to open it,
> and then I can define this problem away by declaring that
> BufFileOpenShared() on a name that you didn't call
> BufFileCreateShared() on invokes undefined behaviour. I will make it
> so.

Here is a patch to deal with that problem. Thoughts?

I suppose if we wanted to make this type of problem go away, but still
keep files for forensic purposes, we could introduce a "restart
number", and stick it into the top level temporary directory's name.
That way you'd never be able to collide with files created before a
crash-restart, and we could add O_EXCL so it'd become an error to try
to create the same filename again.

I'll post a new SharedTuplestore and Parallel Hash patch set shortly.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
0001-Add-defenses-against-pre-crash-files-to-BufFileOpenS.patch application/octet-stream 2.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2017-12-07 23:12:39 Re: How to use set/reset role in contrib_regression test?
Previous Message David Rowley 2017-12-07 23:06:43 Re: [HACKERS] Proposal: Local indexes for partitioned table