Re: [HACKERS] Parallel Hash take II

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: [HACKERS] Parallel Hash take II
Date: 2017-11-17 21:55:54
Message-ID: 20171117215554.vtvfutt37as4nmcg@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2017-11-14 01:30:30 +1300, Thomas Munro wrote:
> New patch attached.

(I've commit some of the preliminary work)

Looking at 0005-Add-infrastructure-for-sharing-temporary-files-betwe.patch:
- The created path/filenames seem really redundant:
base/pgsql_tmp/pgsql_tmp11160.9.sharedfileset.d/pgsql_tmp.o3of8.p0.0

Including pgsql_tmp no less than three times seems a bit absurd.

I'm quite inclined to just remove all but the first.

- There seems to be a moment where could leak temporary file
directories:
File
SharedFileSetCreate(SharedFileSet *fileset, const char *name)
{
char path[MAXPGPATH];
File file;

SharedFilePath(path, fileset, name);
file = PathNameCreateTemporaryFile(path, false);

/* If we failed, see if we need to create the directory on demand. */
if (file <= 0)
{
char tempdirpath[MAXPGPATH];
char filesetpath[MAXPGPATH];
Oid tablespace = ChooseTablespace(fileset, name);

TempTablespacePath(tempdirpath, tablespace);
SharedFileSetPath(filesetpath, fileset, tablespace);
PathNameCreateTemporaryDir(tempdirpath, filesetpath);
file = PathNameCreateTemporaryFile(path, true);
}

return file;
}

The resowner handling is done in PathNameCreateTemporaryFile(). But if
we fail after creating the directory we'll not have a resowner for
that. That's probably not too bad.

- related to the last point, I'm kinda wondering why we need sub-fileset
resowners? Given we're dealing with error paths in resowners I'm not
quite seeing the point - we're not going to want to roll back
sub-parts of of a fileset, no?

- If we want to keep these resowners, shouldn't we unregister them in
PathNameDeleteTemporaryFile?

- PathNameCreateTemporaryFile() and OpenTemporaryFile() now overlap
quite a bit. Can't we rejigger things to base the second on the first?
At the very least the comments need to work out the difference more
closely.

- It's not clear to me why it's correct to have the vfdP->fdstate & FD_TEMPORARY
handling in FileClose() be independent of the file being deleted. At
the very least there needs to be a comment explaining why we chose
that behaviour.

- I think we need to document somehwere that the temp_file_limit in a
shared file set applies independently for each participant that's
writing something. We also should discuss whether that's actually
sane behaviour.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2017-11-17 22:22:23 Re: [HACKERS] Parallel Hash take II
Previous Message Justin Pryzby 2017-11-17 21:43:52 Re: PG10.1 autovac killed building extended stats