Re: on_dsm_detach() callback and parallel tuplesort BufFile resource management

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: on_dsm_detach() callback and parallel tuplesort BufFile resource management
Date: 2017-03-10 02:54:19
Message-ID: CAH2-Wzm=f-g_j1GMvN=nJEtm3Wc=eBt-MNZ=tjbqag-SVX3SCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 9, 2017 at 6:25 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Mar 9, 2017 at 7:29 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> Suppressing ENOENT because it's not clear which backend actually owns
>> a file is a bit different from using it to detect that there are no
>> more segments...
>
> +1, emphatically.

I don't feel strongly either way on this, FWIW.

>> Obviously I screwed some things up in the code I
>> posted, but I think the general idea that the DSM segment owns the
>> files on disk is a good one.
>
> +1 to that, too. The DSM has exactly the lifetime that we want here;
> no backend or resowner involved in the transaction does.

I actually agree with you, from a theoretical perspective. But the
consequences of that theoretical point seem pretty extensive.
Following through with that by pushing much more file state into
shared memory has significant complexity and risk, if it's possible at
all without something like "palloc(), but for shared memory". I would
like to see a practical benefit.

>> I figure that it (via the detach hook)
>> should be able to delete the files using only data in shmem, without
>> reference to any backend-local memory, so that file cleanup is
>> entirely disconnected from backend-local resource cleanup (fds and
>> memory).
>
> That's one approach. Another approach is to somehow tie the two
> together; for example, I thought about teaching FileClose that where
> it currently calls unlink() to get rid of the temporary file, it would
> first go check some shared reference count and decrement it, skipping
> the unlink if the result was >0. But that only works if you have a
> separate chunk of shared memory per File, which seems like it won't
> work for what you need.

It won't work for what Thomas needs because that decision is made per
segment. But, the decision to decrement refcount really does need to
be made at the BufFile level, so Thomas is still not wrong to
structure things that way.

What somewhat justifies the idea of an on_dsm_detach() callback using
a pointer located in shared memory to get to local memory in one
backend (for error handling purposes) is the fact that it's already
tacitly assumed that the local memory used for the BufFile is released
after the resowner clean-up of BufFiles. Otherwise, somebody might get
in big trouble if they called BufFileClose() or something in an error
path. Arguably, the reliance on ordering already exists today.

I'm not saying that that's a good plan, or even an acceptable
trade-off. Pick your poison.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-03-10 02:56:45 Bug in get_partition_for_tuple
Previous Message Robert Haas 2017-03-10 02:35:50 Re: Speed up Clog Access by increasing CLOG buffers