Re: [Proposal] Global temporary tables

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, 蔡松露(子嘉) <zijia(at)taobao(dot)com>, "Cai, Le" <le(dot)cai(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>
Subject: Re: [Proposal] Global temporary tables
Date: 2020-01-29 15:30:44
Message-ID: 6cbc12a0-bf0e-0205-959e-d871eb7e7c3e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29.01.2020 17:47, Robert Haas wrote:
> On Wed, Jan 29, 2020 at 3:13 AM Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>> But I heard only two arguments:
>>
>> 1. Concurrent building of indexes by all backends may consume much memory (n_backends * maintenance_work_mem) and consume a lot of disk/CPU resources.
>> 2. GTT in one session can contains large amount of data and we need index for it, but small amount of data in another session and we do not need index for it.
> You seem to be ignoring the fact that two committers told you this
> probably wasn't safe.
>
> Perhaps your view is that those people made no argument, and therefore
> you don't have to respond to it. But the onus is not on somebody else
> to tell you why a completely novel idea is not safe. The onus is on
> you to analyze it in detail and prove that it is safe. What you need
> to show is that there is no code anywhere in the system which will be
> confused by an index springing into existence at whatever time you're
> creating it.
>
> One problem is that there are various backend-local data structures in
> the relcache, the planner, and the executor that remember information
> about indexes, and that may not respond well to having more indexes
> show up unexpectedly. On the one hand, they might crash; on the other
> hand, they might ignore the new index when they shouldn't. Another
> problem is that the code which creates indexes might fail or misbehave
> when run in an environment different from the one in which it
> currently runs. I haven't really studied your code, so I don't know
> exactly what it does, but for example it would be really bad to try to
> build an index while holding a buffer lock, both because it might
> cause (low-probability) undetected deadlocks and also because it might
> block another process that wants that buffer lock in a
> non-interruptible wait state for a long time.
>
> Now, maybe you can make an argument that you only create indexes at
> points in the query that are "safe." But I am skeptical, because of
> this example:
>
> rhaas=# create table foo (a int primary key, b text, c text, d text);
> CREATE TABLE
> rhaas=# create function blump() returns trigger as $$begin create
> index on foo (b); return new; end$$ language plpgsql;
> CREATE FUNCTION
> rhaas=# create trigger thud before insert on foo execute function blump();
> CREATE TRIGGER
> rhaas=# insert into foo (a) select generate_series(1,10);
> ERROR: cannot CREATE INDEX "foo" because it is being used by active
> queries in this session
> CONTEXT: SQL statement "create index on foo (b)"
> PL/pgSQL function blump() line 1 at SQL statement
>
> That prohibition is there for some reason. Someone did not just decide
> to arbitrarily prohibit it. A CREATE INDEX command run in that context
> won't run afoul of many of the things that might be problems in other
> places -- e.g. there won't be a buffer lock held. Yet, despite the
> fact that a trigger context is safe for executing a wide variety of
> user-defined code, this particular operation is not allowed here. That
> is the sort of thing that should worry you.
>
> At any rate, even if this somehow were or could be made safe,
> on-the-fly index creation is a feature that cannot and should not be
> combined with a patch to implement global temporary tables. Surely, it
> will require a lot of study and work to get the details right. And so
> will GTT. As I said in the other email I wrote, this feature is hard
> enough without adding this kind of thing to it. There's a reason why I
> never got around to implementing this ten years ago when I did
> unlogged tables; I was intending that to be a precursor to the GTT
> work. I found that it was too hard and I gave up. I'm glad to see
> people trying again, but the idea that we can afford to add in extra
> features, or frankly that either of the dueling patches on this thread
> are close to committable, is just plain wrong.
>

Sorry, I really didn't consider statements containing word "probably" as
arguments.
But I agree with you: it is task of developer of new feature to prove
that proposed approach is safe rather than of reviewers to demonstrate
that it is unsafe.
Can I provide such proof now? I afraid that not.
But please consider two arguments:

1. Index for GTT in any case has to be initialized on demand. In case of
regular tables index is initialized at the moment of its creation. In
case of GTT it doesn't work.
So we should somehow detect that accessed index is not initialized and
perform lazy initialization of the index.
The only difference with the approach proposed by Pavel  (allow index
for empty GTT but prohibit it for GTT filled with data) is whether we
also need to populate index with data or not.
I can imagine that implicit initialization of index in read-only query
(select) can be unsafe and cause some problems. I have not encountered
such problems yet after performing many tests with GTTs, but certainly I
have not covered all possible scenarios (not sure that it is possible at
all).
But I do not understand how populating  index with data can add some
extra unsafety.

So I can not prove that building index for GTT on demand is safe, but it
is not more unsafe than initialization of index on demand which is
required in any case.

2. Actually I do not propose some completely new approach. I try to
provide behavior with is compatible with regular tables.
If you create index for regular table, then it can be used in all
sessions, right?
And all "various backend-local data structures in the relcache, the
planner, and the executor that remember information about indexes"
have to be properly updated.  It is done using invalidation mechanism.
The same mechanism is used in case of DDL operations with GTT, because
we change system catalog.

So my point here is that creation index of GTT is almost the same as
creation of index for regular tables and the same mechanism will be used
to provide correctness of this operation.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2020-01-29 15:40:34 Re: doc: vacuum full, fillfactor, and "extra space"
Previous Message Robert Haas 2020-01-29 15:27:08 Re: making the backend's json parser work in frontend code