Re: [HACKERS] Parallel Hash take II

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: [HACKERS] Parallel Hash take II
Date: 2017-12-02 02:54:29
Message-ID: CAEepm=2O2bRsnS7WH_ee+2UtRk58Gjao=K8TFueEq=ULgEOw_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 2, 2017 at 1:55 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Pushed 0003-Add-infrastructure-for-sharing-temporary-files-betwe.patch
> after minor adjustments (some including conversing with you).

Thank you!

> Questions:
> - Right now RemovePgTempFilesInDir() will recurse into appropriately
> named directories, and when it recurses it doesn't require the same
> name pattern checks. I think that's good, but I think it'd be prudent
> to be a bit more paranoid and prevent recursing into symlinked
> subdirectories.

That's why it uses lstat(), so that it sees symlinks rather than what
they point to. It only recurses if S_ISDIR(), and it unlinks anything
else. That means that it unlinks any symlinks rather than (say)
following them and deleting stuff outside the temporary directory
tree. Example:

$ mkdir /tmp/foo
$ touch /tmp/foo/canary
$ mkdir -p $PGDATA/base/pgsql_tmp/pgsql_tmpXXX/bar
$ ln -s /tmp/foo $PGDATA/base/pgsql_tmp/pgsql_tmpXXX/foo
$ ls $PGDATA/base/pgsql_tmp/pgsql_tmpXXX
bar foo
$ postgres/bin/postgres -D $PGDATA
... ^C ...
$ ls $PGDATA/base/pgsql_tmp
$ ls /tmp/foo
canary

Make sense?

> - As we don't clean temp files after crash-restarts it isn't impossible
> to have a bunch of crash-restarts and end up with pids *and* per-pid
> shared file set counters reused. Which'd lead to conflicts. Do we care?

We don't care. PathNameCreateTemporaryDir() on a path that already
exists will return silently. PathNameCreateTemporaryFile() on a path
that already exists, as mentioned in a comment and following an
existing convention, will open and truncate it. So either it was
really an orphan and that is a continuation of our traditional
recycling behaviour, or the calling code has a bug and used the same
path twice and it's not going to end well.

Another type of collision we could have in theory is like this: One
backend creates a SharedFileSet, and various other backends attach to
it. The creator detaches and exits. Later another process is created
with the same PID, and creates a new SharedFileSet, and happens to
have the same counter number, but the original SharedFileSet is still
in existence because there are still running backends attached to it.
Then things get ugly. But this seems improbable, at least as long as
the usage pattern is that leader processes are the only ones creating
SharedFileSet objects and outlive their parallel workers in non-crash
outcomes, but if you think that's too flimsy I suppose it could be
fixed by replacing the per-backend counter variable with a
cluster-wide atomic counter. Then the PID component would be
redundant but I'd suggest keeping it anyway for its diagnostic value.
Thoughts?

Just a reminder: a couple of problems have come up recently in the
Parallel Hash Join patch itself, so please don't consider that one
ready for commit quite yet. They are: (1) Handling the case where
there is no DSA area because we're running a parallel-aware plan in
non-parallel mode due to lack of resources; (2) Investigating a rare
assertion failure. For (1), that may depend on another patch that
I'll post shortly to kill "es_query_dsa" and, come to think of it, for
(2) it's possible that the problem is in either one of the remaining
patches -- SharedTuplestore or Parallel Hash Join -- so please hold
off on committing either of those until I've got to the bottom of
that.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-12-02 03:46:41 Re: [HACKERS] Parallel Hash take II
Previous Message Andres Freund 2017-12-02 02:46:47 Re: [HACKERS] Parallel Hash take II