Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)
Date: 2016-08-05 16:41:39
Message-ID: CA+Tgmob+AS4vPA-wp+KevL7q+qK4c7yZ5wjGvEv6EqPChE2oYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 4, 2016 at 8:14 AM, Aleksander Alekseev
<a(dot)alekseev(at)postgrespro(dot)ru> wrote:
> Thanks everyone for your remarks and comments!
>
> Here is an improved version of a patch.
>
> Main changes:
> * Patch passes `make installcheck`
> * Code is fully commented, also no more TODO's
>
> I wish I sent this version of a patch last time. Now I realize it was
> really hard to read and understand. Hope I managed to correct this
> flaw. If you believe that some parts of the code are still poorly
> commented or could be improved in any other way please let me know.
>
> And as usual, any other comments, remarks or questions are highly
> appreciated!

Three general comments:

1. It's always seemed to me that a huge problem with anything of this
sort is dependencies. For example, suppose I create a fast temporary
table and then I create a functional index on the fast temporary table
that uses some SQL function defined in pg_proc. Then, another user
drops the function. Then, I try to use the index. With regular
temporary tables, dependencies protect us here, but if there are no
catalog entries, I wonder how this can ever be made safe. Similar
problems exist for triggers, constraints, RLS policies, and attribute
defaults.

2. This inserts additional code in a bunch of really low-level places
like heap_hot_search_buffer, heap_update, heap_delete, etc. I think
what you've basically done here is create a new, in-memory heap AM and
then, because we don't have an API for adding new storage managers,
you've bolted it onto the existing heapam layer. That's certainly a
reasonable approach for creating a PoC, but I think we actually need a
real API here. Otherwise, when the next person comes along and wants
to add a third heap implementation, they've got to modify all of these
same places again. I don't think this code is reasonably maintainable
in this form.

3. Currently, temporary tables are parallel-restricted: a query that
references temporary tables can use parallelism, but access to the
temporary tables is only possible from within the leader. I suspect,
although I'm not entirely sure, that lifting this restriction would be
easier with our current temporary table implementation than with this
one, because the current temporary table implementation mostly relies
on a set of buffers that could be moved from backend-private memory to
DSM. On a quick look, this implementation uses a bunch of new data
structures that are heavy on pointers, so that gets quite a bit more
complicated to make parallel-safe (unless we adopt threads instead of
processes!).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2016-08-05 17:00:38 Notice lock waits
Previous Message Jeff Janes 2016-08-05 16:19:56 Re: [RFC] Change the default of update_process_title to off