Re: [Proposal] Global temporary tables

From: 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: "Konstantin Knizhnik" <k(dot)knizhnik(at)postgrespro(dot)ru>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, 蔡松露(子嘉) <zijia(at)taobao(dot)com>, Cai, Le <le(dot)cai(at)alibaba-inc(dot)com>, 张广舟(明虚) <guangzhou(dot)zgz(at)alibaba-inc(dot)com>, 赵殿奎 <diankui(dot)zdk(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>
Subject: Re: [Proposal] Global temporary tables
Date: 2019-10-15 09:49:56
Message-ID: 88BC9750-CCED-4223-8350-1857FDDD8B35@alibaba-inc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> 2019年10月12日 下午1:16,Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> 写道:
>
>
>
> pá 11. 10. 2019 v 15:50 odesílatel Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru <mailto:k(dot)knizhnik(at)postgrespro(dot)ru>> napsal:
>
>
> On 11.10.2019 15:15, 曾文旌(义从) wrote:
>> Dear Hackers,
>>
>> This propose a way to develop global temporary tables in PostgreSQL.
>>
>> I noticed that there is an "Allow temporary tables to exist as empty by default in all sessions" in the postgresql todolist.
>> https://wiki.postgresql.org/wiki/Todo <https://wiki.postgresql.org/wiki/Todo>
>>
>> In recent years, PG community had many discussions about global temp table (GTT) support. Previous discussion covered the following topics:
>> (1) The main benefit or function: GTT offers features like “persistent schema, ephemeral data”, which avoids catalog bloat and reduces catalog vacuum.
>> (2) Whether follows ANSI concept of temporary tables
>> (3) How to deal with statistics, single copy of schema definition, relcache
>> (4) More can be seen in https://www.postgresql.org/message-id/73954ab7-44d3-b37b-81a3-69bdcbb446f7%40postgrespro.ru <https://www.postgresql.org/message-id/73954ab7-44d3-b37b-81a3-69bdcbb446f7%40postgrespro.ru>
>> (5) A recent implementation and design from Konstantin Knizhnik covered many functions of GTT: https://www.postgresql.org/message-id/attachment/103265/global_private_temp-1.patch <https://www.postgresql.org/message-id/attachment/103265/global_private_temp-1.patch>
>>
>> However, as pointed by Konstantin himself, the implementation still needs functions related to CLOG, vacuum, and MVCC visibility.
>>
>
> Just to clarify.
> I have now proposed several different solutions for GTT:
>
> Shared vs. private buffers for GTT:
> 1. Private buffers. This is least invasive patch, requiring no changes in relfilenodes.
> 2. Shared buffers. Requires changing relfilenode but supports parallel query execution for GTT.
>
> This is important argument for using share buffers. Maybe the best is mix of both - store files in temporal tablespace, but using share buffers. More, it can be accessible for autovacuum.
>
> Access to GTT at replica:
> 1. Access is prohibited (as for original temp tables). No changes at all.
> 2. Tuples of temp tables are marked with forzen XID. Minimal changes, rollbacks are not possible.
> 3. Providing special XIDs for GTT at replica. No changes in CLOG are required, but special MVCC visibility rules are used for GTT. Current limitation: number of transactions accessing GTT at replica is limited by 2^32
> and bitmap of correspondent size has to be maintained (tuples of GTT are not proceeded by vacuum and not frozen, so XID horizon never moved).
>
> So except the limitation mentioned above (which I do not consider as critical) there is only one problem which was not addressed: maintaining statistics for GTT.
> If all of the following conditions are true:
>
> 1) GTT are used in joins
> 2) There are indexes defined for GTT
> 3) Size and histogram of GTT in different backends can significantly vary.
> 4) ANALYZE was explicitly called for GTT
>
> then query execution plan built in one backend will be also used for other backends where it can be inefficient.
> I also do not consider this problem as "show stopper" for adding GTT to Postgres.
>
> The last issue is show stopper in my mind. It really depends on usage. There are situation where shared statistics are ok (and maybe good solution), and other situation, where shared statistics are just unusable.
This proposal calculates and stores independent statistics(relpages reltuples and histogram of GTT) for the gtt data within each session, ensuring optimizer can get accurate statistics.

> Regards
>
> Pavel
>
>
>
> I still do not understand the opinion of community which functionality of GTT is considered to be most important.
> But the patch with local buffers and no replica support is small enough to become good starting point.
>
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com <http://www.postgrespro.com/>
> The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2019-10-15 10:45:34 Re: configure fails for perl check on CentOS8
Previous Message Amit Kapila 2019-10-15 09:33:30 Re: [HACKERS] Block level parallel vacuum