Re: [Proposal] Global temporary tables

From: 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, 蔡松露(子嘉) <zijia(at)taobao(dot)com>, Cai, Le <le(dot)cai(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>
Subject: Re: [Proposal] Global temporary tables
Date: 2019-11-08 15:06:17
Message-ID: 5767433E-1A77-4682-B3FD-4869911021C8@alibaba-inc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

My comments for global_private_temp-4.patch

good side:
1 Lots of index type on GTT. I think we need support for all kinds of indexes.
2 serial column on GTT.
3 INHERITS GTT.
4 PARTITION GTT.

I didn't choose to support them in the first release, but you did.

Other side:
1 case: create global temp table gtt2(a int primary key, b text) on commit delete rows;
I think you've lost the meaning of the on commit delete rows clause.
After the GTT is created, the other sessions feel that this is an on commit PRESERVE rows GTT.

2 truncate gtt, mybe this is a bug in DropRelFileNodeBuffers.
GTT's local buffer is not released.
Case:
postgres=# insert into gtt2 values(1,'xx');
INSERT 0 1
postgres=# truncate gtt2;
TRUNCATE TABLE
postgres=# insert into gtt2 values(1,'xx');
ERROR: unexpected data beyond EOF in block 0 of relation base/13579/t3_16384
HINT: This has been seen to occur with buggy kernels; consider updating your system.

3 lock type of truncate GTT.
I don't think it's a good idea to hold a big lock with truncate GTT, because it only needs to process private data.

4 GTT's ddl Those ddl that need to rewrite data files may need attention.
We have discussed in the previous email. This is why I used shared hash to track the GTT file.

5 There will be problems with DDL that will change relfilenode. Such as cluster GTT ,vacuum full GTT.
A session completes vacuum full gtt(a), and other sessions will immediately start reading and writing new storage files and existing data is also lost.
I disable them in my current version.

6 drop GTT
I think drop GTT should clean up all storage files and definitions. How do you think?

7 MVCC visibility clog clean
GTT data visibility rules, like regular tables, so GTT also need clog.
We need to avoid the clog that GTT needs to be cleaned up.
At the same time, GTT does not do autovacuum, and retaining "too old data" will cause wraparound data loss.
I have given a solution in my design.

Zeng Wenjing

> 2019年11月1日 下午11:15,Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> 写道:
>
>
>
> On 25.10.2019 20:00, Pavel Stehule wrote:
>>
>> >
>> >> So except the limitation mentioned above (which I do not consider as critical) there is only one problem which was not addressed: maintaining statistics for GTT.
>> >> If all of the following conditions are true:
>> >>
>> >> 1) GTT are used in joins
>> >> 2) There are indexes defined for GTT
>> >> 3) Size and histogram of GTT in different backends can significantly vary.
>> >> 4) ANALYZE was explicitly called for GTT
>> >>
>> >> then query execution plan built in one backend will be also used for other backends where it can be inefficient.
>> >> I also do not consider this problem as "show stopper" for adding GTT to Postgres.
>> > I think that's *definitely* a show stopper.
>> Well, if both you and Pavel think that it is really "show stopper", then
>> this problem really has to be addressed.
>> I slightly confused about this opinion, because Pavel has told me
>> himself that 99% of users never create indexes for temp tables
>> or run "analyze" for them. And without it, this problem is not a problem
>> at all.
>>
>>
>> Users doesn't do ANALYZE on temp tables in 99%. It's true. But second fact is so users has lot of problems. It's very similar to wrong statistics on persistent tables. When data are small, then it is not problem for users, although from my perspective it's not optimal. When data are not small, then the problem can be brutal. Temporary tables are not a exception. And users and developers are people - we know only about fatal problems. There are lot of unoptimized queries, but because the problem is not fatal, then it is not reason for report it. And lot of people has not any idea how fast the databases can be. The knowledges of users and app developers are sad book.
>>
>> Pavel
>
> It seems to me that I have found quite elegant solution for per-backend statistic for GTT: I just inserting it in backend's catalog cache, but not in pg_statistic table itself.
> To do it I have to add InsertSysCache/InsertCatCache functions which insert pinned entry in the correspondent cache.
> I wonder if there are some pitfalls of such approach?
>
> New patch for GTT is attached.
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com <http://www.postgrespro.com/>
> The Russian Postgres Company
> <global_private_temp-4.patch>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christoph Berg 2019-11-08 15:06:21 Re: Monitoring disk space from within the server
Previous Message Kirk, Steve 2019-11-08 15:00:00 Frequent HLL bitstream_unpack crashes