Re: [Proposal] Global temporary tables

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, 蔡松露(子嘉) <zijia(at)taobao(dot)com>, "Cai, Le" <le(dot)cai(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>
Subject: Re: [Proposal] Global temporary tables
Date: 2019-11-08 12:57:13
Message-ID: 745ef1fa-5a81-d30d-aa33-bce24a022a3a@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08.11.2019 10:50, 曾文旌(义从) wrote:
> In my opinion, it is not a good idea to trigger a btbuild with a select or DML, the cost of which depends on the amount of data in the GTT.
IMHO it is better than returning error.
Also index will be used only if cost of plan with index will be
considered better than cost of plan without index. If you do not have
index, then you have to scan the whole table.
Time of such scan is comparable with time of building index.

Yes, I agree that indexes for GTT are used to be created together with
table itself before it is used by any application.
But if later DBA recognized that efficient execution of queries requires
some more indexes,
it will be strange and dangerous to prevent him from adding such index
until all clients which have accessed this table will drop their
connections.
Also maintaining in shared memory information about attached backends
seems to be overkill.

>>
>> This code initializes B-Tree and load data in it when GTT index is access and is not initialized yet.
>> It looks a little bit hacker but it works.
>>
>> I also wonder why you are keeping information about GTT in shared memory. Looks like the only information we really need to share is table's metadata.
>> But it is already shared though catalog. All other GTT related information is private to backend so I do not see reasons to place it in shared memory.
> The shared hash structure tracks which backend has initialized the GTT storage in order to implement the DDL of the GTT.
Sorry, I do not understand this argument.
DDL is performed on shared metadata present in global catalog.
Standard postgres invalidation mechanism is used to notify all backends
about schema changes.
Why do we need to maintain some extra information in shared memory.
Can you give me example of DLL which does't work without such shared hash?

> As for GTT, there is only one definition(include index on GTT), but each backend may have one data.
> For the implementation of drop GTT, I assume that all data and definitions need to be deleted.

Data of dropped GTT is removed on normal backend termination or cleaned
up at server restart in case of abnormal shutdown (as it is done for
local temp tables).
I have not used any shared control structures for GTT in my
implementation and that is why I wonder why do you need it and what are
the expected problems with my
implementation?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Grigory Smolkin 2019-11-08 13:08:47 Re: [proposal] recovery_target "latest"
Previous Message Alvaro Herrera 2019-11-08 12:50:13 Re: SPI refactoring