[Proposal] Global temporary tables

From: 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: 蔡松露(子嘉) <zijia(at)taobao(dot)com>, Cai, Le <le(dot)cai(at)alibaba-inc(dot)com>, 张广舟(明虚) <guangzhou(dot)zgz(at)alibaba-inc(dot)com>, 赵殿奎 <diankui(dot)zdk(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>
Subject: [Proposal] Global temporary tables
Date: 2019-10-11 12:15:27
Message-ID: 1A1A6EDC-D0EC-47B0-BD21-C2ACBAEA65E4@alibaba-inc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Hackers,

This propose a way to develop global temporary tables in PostgreSQL.

I noticed that there is an "Allow temporary tables to exist as empty by default in all sessions" in the postgresql todolist.
https://wiki.postgresql.org/wiki/Todo <https://wiki.postgresql.org/wiki/Todo>

In recent years, PG community had many discussions about global temp table (GTT) support. Previous discussion covered the following topics:
(1) The main benefit or function: GTT offers features like “persistent schema, ephemeral data”, which avoids catalog bloat and reduces catalog vacuum.
(2) Whether follows ANSI concept of temporary tables
(3) How to deal with statistics, single copy of schema definition, relcache
(4) More can be seen in https://www.postgresql.org/message-id/73954ab7-44d3-b37b-81a3-69bdcbb446f7%40postgrespro.ru
(5) A recent implementation and design from Konstantin Knizhnik covered many functions of GTT: https://www.postgresql.org/message-id/attachment/103265/global_private_temp-1.patch <https://www.postgresql.org/message-id/attachment/103265/global_private_temp-1.patch>

However, as pointed by Konstantin himself, the implementation still needs functions related to CLOG, vacuum, and MVCC visibility.

We developed GTT based on PG 11 and included most needed features, such as how to deal with concurrent DDL and DML operations, how to handle vacuum and too old relfrozenxids, and how to store and access GTT statistics.

This design followed many suggestions from previous discussion in community. Here are some examples:
“have a separate 'relpersistence' setting for global temp tables…by having the backend id in all filename…. From Andres Freund
Use session memory context to store information related to GTT. From Pavel Stehule
“extend the relfilenode mapper to support a backend-local non-persistent relfilenode map that's used to track temp table and index relfilenodes…” from Craig Ringer

Our implementation creates one record in pg_class for GTT’s schema definition. When rows are first inserted into the GTT in a session, a session specific file is created to store the GTT’s data. Those files are removed when the session ends. We maintain the GTT’s statistics in session local memory. DDL operations, such as DROP table or CREATE INDEX, can be executed on a GTT only by one session, while no other sessions insert any data into the GTT before or it is already truncated. This also avoids the concurrency of DML and DDL operations on GTT. We maintain a session level oldest relfrozenxids for GTT. This way, autovacuum or vacuum can truncate CLOG and increase global relfrozenxids based on all tables’ relfrozenxids, including GTT’s.
The follows summarize the main design and implementation:
Syntax: ON COMMIT PRESERVE ROWS and ON COMMIT DELETE ROWS
Data storage and buffering follows the same way as local temp table with a relfilenode including session id.
A hash table(A) in shared memory is used to track sessions and their usage of GTTs and to serialize DDL and DML operations.
Another hash table(B) in session memory is introduced to record storage files for GTTs and their indexes. When a session ends, those files are removed.
The same hash table(B) in session memory is used to record the relfrozenxids of each GTT. The oldest one is stored in myproc so that autovacuum and vacuum may use it to determine global oldest relfrozenxids and truncate clog.
The same hash table(B) in session memory stores GTT’s session level statistics, It is generated during the operations of vacuum and analyze, and used by SQL optimizer to create execution plan.
Some utility functions are added for DBA to manage GTTs.
TRUNCATE command on a GTT behaves differently from that on a normal table. The command deletes the data immediately but keeps relfilenode using lower level table lock, RowExclusiveLock, instead of AccessExclusiveLock.
Main limits of this version or future improvement: need suggestions from community:
1 VACUUM FULL and CLUSTER are not supported; any operations which may change relfilenode are disabled to GTT.
2 Sequence column is not supported in GTT for now.
3 Users defined statistics is not supported.

Details:

Requirement
The features list about global temp table:
1. global temp table (ON COMMIT clause is omitted, SQL specifies that the default behavior is ON COMMIT DELETE ROWS)
2. support with on commit DELETE ROWS
3. support with on commit PRESERVE ROWS
4. not support ON COMMIT DROP

Feature description
Global temp tables are defined just once and automatically exist (starting with empty contents) in every session that needs them.
Global temp table, each session use local buffer, read or write independent data files.
Use on commit DELETE ROWS for a transaction-specific global temp table. This is the default. database will truncate the table (delete all its rows) after each commit.
Use on commit PRESERVE ROWS Specify PRESERVE ROWS for a session-specific global temp table. databse will truncate the table (delete all its rows) when you terminate the session.

design
Global temp tables are designed based on local temp table(buffer and storage files).
Because the catalog of global temp table is shared between sessions but the data is not shared, we need to build some new mechanisms to manage non-shared data and statistics for those data.

1. catalog
1.1 relpersistence
define RELPERSISTENCEGLOBALTEMP 'g'
Mark global temp table in pg_class relpersistence to 'T'. The relpersistence of the index created on the global temp table is also set to ’T'

1.2 on commit clause
In local temp table on commit DELETE ROWS and on commit PRESERVE ROWS not store in catalog, but GTT need.
Store a bool value oncommitdelete_rows to reloptions only for GTT and share with other session.

2. gram.y
Global temp table already has a syntax tree. jush need to remove the warning message "GLOBAL is deprecated in temporary table creation" and mark relpersistence = RELPERSISTENCEGLOBALTEMP

3. STORAGE
3.1. active_gtt_shared_hash
create a hash table in shared memory to trace the GTT files that are initialized in each session.
Each hash entry contains a bitmap that records the backendid of the initialized GTT file.
With this hash table, we know which backend/session are using this GTT.
It will be used in GTT's DDL.

3.2. gtt_storage_local_hash
In each backend, create a local hashtable gtt_storage_local_hash for tracks GTT storage file and statistics.
1). GTT storage file track
When one session inserts data into a GTT for the first time, record to local hash.
2). normal clean GTT files
Use beforeshmemexit to ensure that all files for the session GTT are deleted when the session exits.
3). abnormal situation file cleanup
When a backend exits abnormally (such as oom kill), the startup process started to recovery before accept connect. startup process check and remove all GTT files before redo wal.

4 DDL
4.1 DROP GTT
One GTT table is allowed to be deleted when only the current session USES it. After get the AccessExclusiveLock of the GTT table, use active_gtt_shared_hash to check and make sure that.

4.2 ALTER GTT
Same as drop GTT.

4.3 CREATE INDEX ON GTT, DROP INDEX ON GTT
Same as drop GTT.

4.4 TRUNCATE GTT
The truncate GTT use RowExclusiveLock, not AccessExclusiveLock, Because truncate only cleans up local data file and local buffers in this session.
Also, truncate immediately deletes the data file without changing the relfilenode of the GTT table. btw, I'm not sure the implementation will be acceptable to the community.

4.5 create index on GTT
Same as drop GTT.

4.6 OTHERS
Any table operations about GTT that need to change relfilenode are disabled, such as vacuum full/cluster.

5. The statistics of GTT
1 relpages reltuples relallvisible frozenxid minmulti from pg_class
2 The statistics for each column from pg_statistic
All the above information will be stored to gtt_storage_local_hash.
When vacuum or analyze GTT's statistic will update, and the planner will use them. Of course, statistics only contain data within the current session.

5.1. View global temp table statistics
Provide pggttattstatistic get column statistics for GTT. Provide pggtt_relstats to rel statistics for GTT.
These functions are implemented in a plug-in, without add system view or function.

6. autovacuum
Autovacuum skips all GTT.

7. vacuum(frozenxid push, clog truncate)
The GTT data file contains transaction information. Queries for GTT data rely on transaction information such as clog. That's can not be vacuumed automatically by vacuum.
7.1 The session level gtt oldest frozenxid
When one GTT been create or remove, record the session level oldest frozenxid and put it into MyProc.

7.1 vacuum
When vacuum push the db's frozenxid(vacupdatedatfrozenxid), need to consider the GTT. It needs to calculate the transactions required for the GTT(search all MyPorc), to avoid the clog required by GTT being cleaned.

8. Parallel query
Planner does not produce parallel query plans for SQL related to global temp table.

9. Operability
Provide pggttattachedpid lists all the pids that are using the GTT. Provide pglistgttrelfrozenxids lists the session level oldest frozenxid of using GTT.
These functions are implemented in a plug-in, without add system view or function.
DBA can use the above function and pgterminatebackend to force the cleanup of "too old" GTT tables and sessions.

10. Limitations and todo list
10.1. alter GTT
10.2. pg_statistic_ext
10.3. remove GTT's relfilenode can not change limit.
cluster/vacuum full, optimize truncate gtt.
10.4. SERIAL column type
The GTT from different sessions share a sequence(SERIAL type).
Need each session use the sequence independently.
10.5. Locking optimization for GTT.
10.6 materialized views is not support on GTT.

What do you thinking about this proposal?
Looking forward to your feedback.

Thanks!

regards

--
Zeng Wenjing
Alibaba Group-Database Products Business Unit

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2019-10-11 12:20:54 Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status)
Previous Message Mahendra Singh 2019-10-11 11:17:22 Re: [HACKERS] Block level parallel vacuum