Re: global temporary tables

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jim Nasby <decibel(at)decibel(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: global temporary tables
Date: 2010-04-25 01:14:44
Message-ID: y2u603c8f071004241814x10e29bc8pb46cad5edc0e2a5b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 24, 2010 at 8:47 PM, Jim Nasby <decibel(at)decibel(dot)org> wrote:
> On Apr 24, 2010, at 12:31 PM, Tom Lane wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> At least AIUI, the use case for this feature is that you want to avoid
>>> creating "the same" temporary table over and over again.
>>
>> The context that I've seen it come up in is that people don't want to
>> clutter their functions with create-it-if-it-doesn't-exist logic,
>> which you have to have given the current behavior of temp tables.
>> Any performance gain from reduced catalog churn would be gravy.
>>
>> Aside from the DROP problem, I think this implementation proposal
>> has one other big shortcoming: what are you going to do about
>> table statistics?  In many cases, you really *have* to do an ANALYZE
>> once you've populated a temp table, if you want to get decent plans
>> for it.  Where will you put those stats?
>
> One possibility: rename the existing pg_stats to pg_stats_permanent. Create a global temporary table called pg_stats_temporary. pg_stats becomes a union of the two. I know the backend wouldn't be able to use the view, but hopefully access to statistics goes through a limited set of functions so that teaching them about the two different tables isn't hard.

Yeah, I don't think that would be too horrible. Part of me feels like
you'd want to have the ability to store stats for a global temp table
in either one of those tables depending on use-case, but I'm also
reluctant to invent a lot of new syntax for a very limited use case.

> As for cleanup and storage questions; what about having temp objects live in pgsql_tmp? I'm thinking create a directory under pgsql_tmp for a backend PID the first time it creates a temp object (global or local) and create the files in there. That also means that we don't have to come up with different relfilenodes for each backend.

That would impose a couple of implementation restrictions that don't
seem necessary. One, it would imply ignoring reltablespace. Two, it
would prohibit (or at least complicate) allowing a backend to CLUSTER
or REINDEX its own private copy of the rel.

> On the other hand, some layer (presumably smgr) would need to understand whether a relation was temporary or not. If we do that, cleanup is easy: you can remove any directories that no longer have a running PID. For forensics you probably only want to do that automatically when a backend starts and discovers it already has a directory, though we should also provide an administrator function that will clobber all directories that no longer have backends.

Unfortunately, I don't see much alternative to making smgr know
something about the temp-ness of the relation, though I'm hoping to
keep the smgr surgery to an absolute minimum. Maybe what we could do
is incorporate the backend ID or PID into the file name when the
relation is temp. Then we could scan for and nuke such files pretty
easily. Otherwise I can't really think how to make it work.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-04-25 01:16:50 Re: [RFC] nodeToString format and exporting the SQL parser
Previous Message Michael Tharp 2010-04-25 01:08:48 Re: [RFC] nodeToString format and exporting the SQL parser