Re: including PID or backend ID in relpath of temp rels

From: Jim Nasby <decibel(at)decibel(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: including PID or backend ID in relpath of temp rels
Date: 2010-05-17 18:10:00
Message-ID: 8BFFD9DF-0808-4792-94AB-739BE9C22E76@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On May 6, 2010, at 10:24 PM, Robert Haas wrote:
> On Tue, May 4, 2010 at 3:03 PM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
>>>>> [smgr.c,inval.c] Do we need to call CacheInvalidSmgr for temporary
>>>>> relations? I think the only backend that can have an smgr reference
>>>>> to a temprel other than the owning backend is bgwriter, and AFAICS
>>>>> bgwriter will only have such a reference if it's responding to a
>>>>> request by the owning backend to unlink the associated files, in which
>>>>> case (I think) the owning backend will have no reference.
>
> This turns out to be wrong, I think. It seems that what we do is
> prevent backends other than the opening backend from reading pages
> from non-local temp rels into private or shared buffers, but we don't
> actually prevent them from having smgr references. This allows
> autovacuum to drop them, for example, in an anti-wraparound situation.
> (Thanks to Tom for helping me get my head around this better.)
>
>>>> Hmm, wasn't there a proposal to have the owning backend delete the files
>>>> instead of asking the bgwriter to?
>>>
>>> I did propose that upthread; it may have been proposed previously
>>> also. This might be worth doing independently of the rest of the patch
>>> (which I'm starting to fear is doomed, cue ominous soundtrack) since
>>> it would reduce the chance of orphaning data files and possibly
>>> simplify the logic also.
>>
>> +1 for doing it separately, but hopefully that doesn't mean the rest of
>> this patch is doomed ...
>
> Doom has been averted. Proposed patch attached. It passes regression
> tests and seems to work, but could use additional testing and, of
> course, some code-reading also.
>
> Some notes on this patch:
>
> It seems prett clear that it isn't desirable to simply add backend ID
> to RelFileNode, because there are too many places using RelFileNode
> already for purposes where the backend ID can be inferred from
> context, such as buffer headers and most of xlog. Instead, I
> introduced BackendRelFileNode, which consists of an ordinary
> RelFileNode augmented with a backend ID, and use that only where
> needed. In particular, the smgr layer must use BackendRelFileNode
> throughout, since it operates on both permanent and temporary
> relations. smgr invalidations must also include the backend ID. xlog
> generally happens only for non-temporary relations and can thus
> continue to use an ordinary RelFileNode; however, commit/abort records
> must use BackendRelFileNode as there may be physical storage
> associated with temporary relations that must be unlinked.
> Communication with the bgwriter must use BackendRelFileNode for
> similar reasons. The relcache now stores rd_backend rather than
> rd_islocaltemp so that it remains straightforward to call smgropen()
> based on a relcache entry. Some smgr functions no longer require an
> isTemp argument, because they can infer the necessary information from
> their BackendRelFileNode. smgrwrite() and smgrextend() now take a
> skipFsync argument rather than an isTemp argument.
>
> I'm not totally sure whether it makes sense to do what we were talking
> about above, viz, transfer unlink responsibility for temp rels from
> the bgwriter to the owning backend. I haven't done that here. Nor
> have I implemented any kind of improved temporary file cleanup
> strategy, though I hope such a thing is possible.

Any particular reason not to use directories to help organize things? IE:

base/database_oid/pg_temp_rels/backend_pid/relfilenode

Perhaps relfilenode should be something else.

This seems to have several advantages:

1: It's more organized. If you want to see all the files for a single backend you have just one place to look. Finding everything is still easy via filesystem find.
2: Cleanup becomes easier. When a backend exits, it's entire directory goes away. On server start, everything under pg_temp_rels goes away. Unfortunately we still have a race condition with cleaning up if a backend dies and can't run it's own cleanup, though I think that anytime that happens we're going to restart everything anyway.
3: It separates all the temporary stuff away from real files.

The only downside I see is some extra code to create the backend_pid directory.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2010-05-17 18:15:18 Re: SELECT * in a CREATE VIEW statement doesn't update column set automatically
Previous Message Jim Nasby 2010-05-17 17:45:14 Re: Partitioning/inherited tables vs FKs