Re: WIP: [[Parallel] Shared] Hash

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: [[Parallel] Shared] Hash
Date: 2017-03-22 02:18:33
Message-ID: CAH2-Wz=+9Rfqh5UdvdW9rGezdhrMGGH-JL1X9FXXVZdeeGeOJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 21, 2017 at 5:07 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> buffile.c should stop pretending to care about anything other than
>> temp files, IMV. 100% of all clients that want temporary files go
>> through buffile.c. 100% of all clients that want non-temp files (files
>> which are not marked FD_TEMPORARY) access fd.c directly, rather than
>> going through buffile.c.
>
> I still need BufFile because I want buffering.
>
> There are 3 separate characteristics enabled by flags with 'temporary'
> in their name. I think we should consider separating the concerns by
> splitting and renaming them:
>
> 1. Segmented BufFile behaviour. I propose renaming BufFile's isTemp
> member to isSegmented, because that is what it really does. I want
> that feature independently without getting confused about lifetimes.
> Tested with small MAX_PHYSICAL_FILESIZE as you suggested.

I would have proposed to get rid of the isTemp field entirely. It is
always true with current usage, any only #ifdef NOT_USED code presumes
that it could be any other way. BufFile is all about temp files, which
ISTM should be formalized. The whole point of BufFile is to segment
fd.c temp file segments. Who would ever want to use BufFile without
that capability anyway?

> 2. The temp_file_limit system. Currently this applies to fd.c files
> opened with FD_TEMPORARY. You're right that we shouldn't be able to
> escape that sanity check on disk space just because we want to manage
> disk file ownership differently. I propose that we create a new flag
> FD_TEMP_FILE_LIMIT that can be set independentlyisTemp of the flags
> controlling disk file lifetime. When working with SharedBufFileSet,
> the limit applies to each backend in respect of files it created,
> while it has them open. This seems a lot simpler than any
> shared-temp-file-limit type scheme and is vaguely similar to the way
> work_mem applies in each backend for parallel query.

I agree that that makes sense as a user-visible behavior of
temp_file_limit. This user-visible behavior is what I actually
implemented for parallel CREATE INDEX.

> 3. Delete-on-close/delete-at-end-of-xact. I don't want to use that
> facility so I propose disconnecting it from the above. We c{ould
> rename those fd.c-internal flags FD_TEMPORARY and FD_XACT_TEMPORARY to
> FD_DELETE_AT_CLOSE and FD_DELETE_AT_EOXACT.

This reliably unlink()s all files, albeit while relying on unlink()
ENOENT as a condition that terminates deletion of one particular
worker's BufFile's segments. However, because you effectively no
longer use resowner.c, ISTM that there is still a resource leak in
error paths. ResourceOwnerReleaseInternal() won't call FileClose() for
temp-ish files (that are not quite temp files in the current sense) in
the absence of no other place managing to do so, such as
BufFileClose(). How can you be sure that you'll actually close() the
FD itself (not vFD) within fd.c in the event of an error? Or Delete(),
which does some LRU maintenance for backend's local VfdCache?

If I follow the new code correctly, then it doesn't matter that you've
unlink()'d to take care of the more obvious resource management chore.
You can still have a reference leak like this, if I'm not mistaken,
because you still have backend local state (local VfdCache) that is
left totally decoupled with the new "shadow resource manager" for
shared BufFiles.

> As shown in 0008-hj-shared-buf-file-v8.patch. Thoughts?

A less serious issue I've also noticed is that you add palloc() calls,
implicitly using the current memory context, within buffile.c.
BufFileOpenTagged() has some, for example. However, there is a note
that we don't need to save the memory context when we open a BufFile
because we always repalloc(). That is no longer the case here.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-03-22 02:25:15 Re: Logical decoding on standby
Previous Message Peter Eisentraut 2017-03-22 02:13:18 Re: Allow pg_dumpall to work without pg_authid