Re: Global temporary tables

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Global temporary tables
Date: 2019-08-01 08:13:18
Message-ID: 0baf9e02-23b3-f3c1-6fe9-fd35aee6fee1@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01.08.2019 6:10, Craig Ringer wrote:
>
> 3. It is not possible to use temporary tables at replica.
>
>
> For physical replicas, yes.

Yes, definitely logical replicas (for example our PgPro-EE multimaster
based on logical replication) do not suffer from this problem.
But in case of multimaster we have another problem related with
temporary tables: we have to use 2PC for each transaction and using
temporary tables in prepared transaction is now prohibited.
This was the motivation of the patch proposed by Stas Kelvich which
allows to use temporary tables in prepared transactions under some
conditions.
>  5. Inefficient memory usage and possible memory overflow: each backend
>
> maintains its own local buffers for work with temporary tables.
>
>
> Is there any reason that would change with global temp tables? We'd
> still be creating a backend-local relfilenode for each backend that
> actually writes to the temp table, and I don't see how it'd be useful
> or practical to keep those in shared_buffers.
>
Yes, my implementation of global temp tables is using shared buffers.
It was not strictly needed as far as data is local. It is possible to
have shared metadata and private data accessed through local buffers.
But I have done it for three reasons:
1, Make it possible to use parallel plans for temp tables.
2. Eliminate memory overflow problem.
3. Make in possible to reschedule session to other backens (connection
pooler).

> Using local buffers has big advantages too. It saves shared_buffers
> space for data where there's actually some possibility of getting
> cache hits, or for where we can benefit from lazy/async writeback and
> write combining. I wouldn't want to keep temp data there if I had the
> option.

Definitely local buffers have some advantages:
- do not require synchronization
- avoid flushing data from shared buffers

But global temp tables are not excluding use of original (local) temp
tables.
So you will have a choice: either to use local temp tables which can be
easily created on demand and accessed through local buffers,
either create global temp tables, which eliminate catalog bloating,
allow parallel queries and which data is  controlled by the same cache
replacement discipline as for normal tables...

>
>
> Default size of temporary buffers is 8Mb. It seems to be too small
> for
> modern servers having hundreds of gigabytes of RAM, causing extra
> copying of data between OS cache and local buffers. But if there are
>
> thousands of backends, each executing queries with temporary tables,
>
> then  total amount of memory used for temporary buffers can exceed
>
> several tens of gigabytes.
>
>
> Right. But what solution do you propose for this? Putting that in
> shared_buffers will do nothing except deprive shared_buffers of space
> that can be used for other more useful things. A server-wide temp
> buffer would add IPC and locking overheads and AFAICS little benefit.
> One of the big appeals of temp tables is that we don't need any of that.

I do not think that parallel execution and efficient connection pooling
are "little benefit".
>
> If you want to improve server-wide temp buffer memory accounting and
> management that makes sense. I can see it being useful to have things
> like a server-wide DSM/DSA pool of temp buffers that backends borrow
> from and return to based on memory pressure on a LRU-ish basis, maybe.
> But I can also see how that'd be complex and hard to get right. It'd
> also be prone to priority inversion problems where an idle/inactive
> backend must be woken up to release memory or release locks, depriving
> an actively executing backend of runtime. And it'd be as likely to
> create inefficiencies with copying and eviction as solve them since
> backends could easily land up taking turns kicking each other out of
> memory and re-reading their own data.
>
> I don't think this is something that should be tackled as part of work
> on global temp tables personally.

My assumptions are the following: temporary tables are mostly used in
OLAP queries. And OLAP workload  means that there are few concurrent
queries which are working with large datasets.
So size of produced temporary tables can be quite big. For OLAP it seems
to be very important to be able to use parallel query execution and use
the same cache eviction rule both for persistent and temp tables
(otherwise you either cause swapping, either extra copying of data
between OS and Postgres caches).

>
> 6. Connection pooler can not reschedule session which has created
> temporary tables to some other backend because it's data is stored
> in local buffers.
>
>
> Yeah, if you're using transaction-associative pooling. That's just
> part of a more general problem though, there are piles of related
> issues with temp tables, session GUCs, session advisory locks and more.
>
> I don't see how global temp tables will do you the slightest bit of
> good here as the data in them will still be backend-local. If it isn't
> then you should just be using unlogged tables.

You can not use the same unlogged table to save intermediate query
results in two parallel sessions.

> Definition of this table (metadata) is shared by all backends but
> data
> is private to the backend. After session termination data is
> obviously lost.
>
>
> +1 that's what a global temp table should be, and it's IIRC pretty
> much how the SQL standard specifies temp tables.
>
> I suspect I'm overlooking some complexities here, because to me it
> seems like we could implement these fairly simply. A new relkind would
> identify it as a global temp table and the relfilenode would be 0.
> Same for indexes on temp tables. We'd extend the relfilenode mapper to
> support a backend-local non-persistent relfilenode map that's used to
> track temp table and index relfilenodes. If no relfilenode is defined
> for the table, the mapper would allocate one. We already happily
> create missing relfilenodes on write so we don't even have to
> pre-create the actual file. We'd register the relfilenode as a
> tempfile and use existing tempfile cleanup mechanisms, and we'd use
> the temp tablespace to store it.
>
> I must be missing something important because it doesn't seem hard.
>
As I already wrote, I tried to kill two bird with one stone: eliminate
catalog bloating and allow access to temp tables from multiple backends
(to be able to perform parallel queries and connection pooling).
This is why I have to use shared buffers for global temp tables.
May be it was not so good idea. But it was one of my primary intention
of publishing this patch to know opinion of other people.
In PG-Pro some of my colleagues think  that the most critical problem is
inability to use temporary tables at replica.
Other think that it is not a problem at all if you are using logical
replication.
From my point of view the most critical problem is inability to use
parallel plans for temporary tables.
But looks like you don't think so.

I see three different activities related with temporary tables:
1. Shared metadata
2. Shared buffers
3. Alternative concurrency control & reducing tuple header size
(specialized table access method for temporary tables)

In my proposal I combined 1 and 2, leaving 3 for next step.
I will be interested to know other suggestions.

One more thing - 1 and 2 are really independent: you can share metadata
without sharing buffers.
But introducing yet another kind of temporary tables seems to be really
overkill:
- local temp tables (private namespace and lcoal buffers)
- tables with shared metadata but local bufferes
- tables with shared metadata and bufferes

> The drawback of such approach is that it will be necessary to
> reimplement large bulk of heapam code.
> But this approach allows to eliminate visibility check for temporary
> table tuples and decrease size of tuple header.
>
>
> That sounds potentially cool, but perhaps a "next step" thing? Allow
> the creation of global temp tables to specify reloptions, and you can
> add it as a reloption later. You can't actually eliminate visibility
> checks anyway because they're still MVCC heaps.

Sorry?
I mean elimination of MVCC overhead (visibility checks) for temp tables
only.
I am not sure that we can really fully eliminate it if we support use of
temp tables in prepared transactions and autonomous transactions (yet
another awful feature we have in PgPro-EE).
Also looks like we need to have some analogue of CID to be able to
correctly executed queries like "insert into T (select from T ...)"
where T is global temp table.
I didn't think much about it, but I really considering new table access
method API for reducing per-tuple storage overhead for temporary and
append-only tables.

> Savepoints can create invisible tuples even if you're using temp
> tables that are cleared on commit, and of course so can DELETEs or
> UPDATEs. So I'm not sure how much use it'd really be in practice.
>
Yehh, subtransactions can be also a problem for eliminating xmin/xmax
for temp tables. Thanks for noticing it.

I noticed that I have not patched some extension - fixed and rebased
version of the patch is attached.
Also you can find this version in our github repository:
https://github.com/postgrespro/postgresql.builtin_pool.git
branch global_temp.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
session_tables-1.patch text/x-patch 42.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-08-01 08:21:13 Re: concerns around pg_lsn
Previous Message Thomas Munro 2019-08-01 08:13:05 Re: unlogged sequences