Re: [Proposal] Global temporary tables

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, 蔡松露(子嘉) <zijia(at)taobao(dot)com>, "Cai, Le" <le(dot)cai(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>
Subject: Re: [Proposal] Global temporary tables
Date: 2020-01-29 08:12:59
Message-ID: 407d6499-950d-e3d9-acbc-a2e86d8b82a3@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27.01.2020 22:44, Pavel Stehule wrote:
>
> I don't think so compatibility with Oracle is valid point in this
> case. We need GTT, but the mechanism of index building should be
> designed for Postgres, and for users.
>
> Maybe the method proposed by you can be activated by some option like
> CREATE INDEX IMMEDIATELY FOR ALL SESSION. When you use GTT without
> index, then
> it should to work some time more, and if you use short life sessions,
> then index build can be last or almost last operation over table and
> can be suboptimal.
>
> Anyway, this behave can be changed later without bigger complications
> - and now I am have strong opinion to prefer don't allow to any DDL
> (with index creation) on any active GTT in other sessions.
> Probably your proposal - build indexes on other sessions when GTT is
> touched can share code with just modify metadata and wait on session
> reset or GTT reset
>
Well, compatibility with Oracle was never treated as important argument
in this group:)
But I hope that you agree that it real argument against your proposal.
Much more important argument is incompatibility with behavior of regular
table.
If you propose such incompatibility, then you should have some very
strong arguments for such behavior which will definitely confuse users.

But I heard only two arguments:

1. Concurrent building of indexes by all backends may consume much
memory (n_backends * maintenance_work_mem) and consume a lot of disk/CPU
resources.

First of all it is not completely true. Indexes will be created on
demand when GTT will be accessed and chance that all sessions will
become building indexes simultaneously is very small.

But what will happen if we prohibit access to this index for existed
sessions? If we need index for GTT, then most likely it is used for joins.
If there is no index, then optimizer has to choose some other plan to
perform this join. For example use hash join. Hash join also requires
memory,
so if all backends will perform such join simultaneously, then them
consume (n_backends * work_mem) memory.
Yes, work_mem is used to be smaller than maintenance_work_mem. But in
any case DBA has a choice to adjust this parameters to avoid this problem.
And in case of your proposal (prohibit access to this index) you give
him no choice to optimize query execution in existed sessions.

Also if all sessions will simultaneously perform sequential scan of GTT
instead of building index for it, then them will read the same amount of
data and consume comparable CPU time.
So prohibiting access to the indexes will not save us from high
resources consumption if all existed sessions are really actively
working with this GTT.

2. GTT in one session can contains large amount of data and we need
index for it, but small amount of data in another session and we do not
need index for it.

Such situation definitely can happen. But it contradicts to the main
assumption of GTT use case (that it is accessed in the same way by all
sessions).
Also I may be agree with this argument if you propose to create indexes
locally for each sessions.
But your proposal is to prohibit access to the index to the sessions
which already have populated GTT with data but allow it for sessions
which have not accessed this GTT yet.
So if some session stores some data in GTT after index was created, then
it will build index for it, doesn't matter whether size of table is
small or large.
Why do we make an exception for sessions which already have data in GTT
in this case?

So from my point of view both arguments are doubtful and can not explain
why rules of index usability for GTT should be different from regular
tables.

> Usually it is not hard problem to refresh sessions, and what I know
> when you update plpgsql code, it is best practice to refresh session
> early.
>

I know may systems where session is established once client is connected
to the system and not closed until client is disconnected.
And any attempt to force termination of the session will cause
application errors which are not expected by the client.

Sorry, I think that it is principle point in discussion concerning GTT
design.
Implementation of GTT can be changed in future, but it is bad if
behavior of GTT will be changed.
It is not clear for me why from the very beginning we should provide
inconsistent behavior which is even more difficult to implement than
behavior compatible with regular tables.
And say that in the future it can be changed...

Sorry, but I do not consider proposals to create indexes locally for
each session (i.e. global tables but private indexes) or use some
special complicated SQL syntax constructions like
CREATE INDEX IMMEDIATELY FOR ALL SESSION as some real alternatives which
have to be discussed.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Luis Carril 2020-01-29 08:30:16 Re: Option to dump foreign data in pg_dump
Previous Message Kyotaro Horiguchi 2020-01-29 08:10:20 Re: Physical replication slot advance is not persistent