Re: Global temporary tables

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Global temporary tables
Date: 2019-08-01 03:10:40
Message-ID: CAMsr+YGnJg9OBwG8RU6k1wm64QuPwwtyZkG2D0vsME=r8krCAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 31 Jul 2019 at 23:05, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
wrote:

> Current Postgres implementation of temporary table causes number of
> problems:
>
> 1. Catalog bloating: if client creates and deletes too many temporary
> tables, then autovacuum get stuck on catalog.
>

This also upsets logical decoding a little - AFAICS it still has to treat
transactions that use temporary tables as catalog-modifying transactions,
tracking them in its historic catalog snapshots and doing extra cache
flushes etc when decoding them.

This will become even more important as we work to support eager/optimistic
output plugin processing of in-progress transactions. We'd have to switch
snapshots more, and that can get quite expensive so using temp tables could
really hurt performance. Or we'd have to serialize on catalog-changing
transactions, in which case using temp tables would negate the benefits of
optimistic streaming of in-progress transactions.

> 3. It is not possible to use temporary tables at replica.

For physical replicas, yes.

> Hot standby
> configuration is frequently used to run OLAP queries on replica
> and results of such queries are used to be saved in temporary tables.
> Right now it is not possible (except "hackers" solution with storing
> results in file_fdw).
>

Right. Because we cannot modify pg_class, pg_attribute etc, even though we
could reasonably enough write to local-only relfilenodes on a replica if we
didn't have to change WAL-logged catalog tables.

I've seen some hacks suggested around this where we have an unlogged fork
of each of the needed catalog tables, allowing replicas to write temp table
info to them. We'd scan both the logged and unlogged forks when doing
relcache management etc. But there are plenty of ugly issues with this.
We'd have to reserve oid ranges for them which is ugly; to make it BC
friendly those reservations would probably have to take the form of some
kind of placeholder entry in the real pg_class. And it gets ickier from
there. It hardly seems worth it when we should probably just implement
global temp tables instead.

> 5. Inefficient memory usage and possible memory overflow: each backend
> maintains its own local buffers for work with temporary tables.
>

Is there any reason that would change with global temp tables? We'd still
be creating a backend-local relfilenode for each backend that actually
writes to the temp table, and I don't see how it'd be useful or practical
to keep those in shared_buffers.

Using local buffers has big advantages too. It saves shared_buffers space
for data where there's actually some possibility of getting cache hits, or
for where we can benefit from lazy/async writeback and write combining. I
wouldn't want to keep temp data there if I had the option.

If you're concerned about the memory use of backend local temp buffers, or
about how we account for and limit those, that's worth looking into. But I
don't think it'd be something that should be affected by global-temp vs
backend-local-temp tables.

> Default size of temporary buffers is 8Mb. It seems to be too small for
> modern servers having hundreds of gigabytes of RAM, causing extra
> copying of data between OS cache and local buffers. But if there are

thousands of backends, each executing queries with temporary tables,

then total amount of memory used for temporary buffers can exceed

several tens of gigabytes.
>

Right. But what solution do you propose for this? Putting that in
shared_buffers will do nothing except deprive shared_buffers of space that
can be used for other more useful things. A server-wide temp buffer would
add IPC and locking overheads and AFAICS little benefit. One of the big
appeals of temp tables is that we don't need any of that.

If you want to improve server-wide temp buffer memory accounting and
management that makes sense. I can see it being useful to have things like
a server-wide DSM/DSA pool of temp buffers that backends borrow from and
return to based on memory pressure on a LRU-ish basis, maybe. But I can
also see how that'd be complex and hard to get right. It'd also be prone to
priority inversion problems where an idle/inactive backend must be woken up
to release memory or release locks, depriving an actively executing backend
of runtime. And it'd be as likely to create inefficiencies with copying and
eviction as solve them since backends could easily land up taking turns
kicking each other out of memory and re-reading their own data.

I don't think this is something that should be tackled as part of work on
global temp tables personally.

> 6. Connection pooler can not reschedule session which has created
> temporary tables to some other backend because it's data is stored in local
> buffers.
>

Yeah, if you're using transaction-associative pooling. That's just part of
a more general problem though, there are piles of related issues with temp
tables, session GUCs, session advisory locks and more.

I don't see how global temp tables will do you the slightest bit of good
here as the data in them will still be backend-local. If it isn't then you
should just be using unlogged tables.

> Definition of this table (metadata) is shared by all backends but data
> is private to the backend. After session termination data is obviously
> lost.
>

+1 that's what a global temp table should be, and it's IIRC pretty much how
the SQL standard specifies temp tables.

I suspect I'm overlooking some complexities here, because to me it seems
like we could implement these fairly simply. A new relkind would identify
it as a global temp table and the relfilenode would be 0. Same for indexes
on temp tables. We'd extend the relfilenode mapper to support a
backend-local non-persistent relfilenode map that's used to track temp
table and index relfilenodes. If no relfilenode is defined for the table,
the mapper would allocate one. We already happily create missing
relfilenodes on write so we don't even have to pre-create the actual file.
We'd register the relfilenode as a tempfile and use existing tempfile
cleanup mechanisms, and we'd use the temp tablespace to store it.

I must be missing something important because it doesn't seem hard.

Global temporary tables are accessed though shared buffers (to solve
> problem 2).
>

I'm far from convinced of the wisdom or necessity of that, but I haven't
spent as much time digging into this problem as you have.

> The drawback of such approach is that it will be necessary to
> reimplement large bulk of heapam code.
> But this approach allows to eliminate visibility check for temporary
> table tuples and decrease size of tuple header.
>

That sounds potentially cool, but perhaps a "next step" thing? Allow the
creation of global temp tables to specify reloptions, and you can add it as
a reloption later. You can't actually eliminate visibility checks anyway
because they're still MVCC heaps. Savepoints can create invisible tuples
even if you're using temp tables that are cleared on commit, and of course
so can DELETEs or UPDATEs. So I'm not sure how much use it'd really be in
practice.

--
Craig Ringer http://www.2ndQuadrant.com/
2ndQuadrant - PostgreSQL Solutions for the Enterprise

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-08-01 03:18:20 Refactoring code stripping trailing \n and \r from strings
Previous Message Craig Ringer 2019-08-01 02:45:50 Re: concerns around pg_lsn