Re: [Proposal] Global temporary tables

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, 蔡松露(子嘉) <zijia(at)taobao(dot)com>, "Cai, Le" <le(dot)cai(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>
Subject: Re: [Proposal] Global temporary tables
Date: 2020-01-29 14:47:52
Message-ID: CA+TgmobN=o7d4zTJywJzsvkCeL4-_3ttWK_dhzpt-P3yCJ=77w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 29, 2020 at 3:13 AM Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> But I heard only two arguments:
>
> 1. Concurrent building of indexes by all backends may consume much memory (n_backends * maintenance_work_mem) and consume a lot of disk/CPU resources.
> 2. GTT in one session can contains large amount of data and we need index for it, but small amount of data in another session and we do not need index for it.

You seem to be ignoring the fact that two committers told you this
probably wasn't safe.

Perhaps your view is that those people made no argument, and therefore
you don't have to respond to it. But the onus is not on somebody else
to tell you why a completely novel idea is not safe. The onus is on
you to analyze it in detail and prove that it is safe. What you need
to show is that there is no code anywhere in the system which will be
confused by an index springing into existence at whatever time you're
creating it.

One problem is that there are various backend-local data structures in
the relcache, the planner, and the executor that remember information
about indexes, and that may not respond well to having more indexes
show up unexpectedly. On the one hand, they might crash; on the other
hand, they might ignore the new index when they shouldn't. Another
problem is that the code which creates indexes might fail or misbehave
when run in an environment different from the one in which it
currently runs. I haven't really studied your code, so I don't know
exactly what it does, but for example it would be really bad to try to
build an index while holding a buffer lock, both because it might
cause (low-probability) undetected deadlocks and also because it might
block another process that wants that buffer lock in a
non-interruptible wait state for a long time.

Now, maybe you can make an argument that you only create indexes at
points in the query that are "safe." But I am skeptical, because of
this example:

rhaas=# create table foo (a int primary key, b text, c text, d text);
CREATE TABLE
rhaas=# create function blump() returns trigger as $$begin create
index on foo (b); return new; end$$ language plpgsql;
CREATE FUNCTION
rhaas=# create trigger thud before insert on foo execute function blump();
CREATE TRIGGER
rhaas=# insert into foo (a) select generate_series(1,10);
ERROR: cannot CREATE INDEX "foo" because it is being used by active
queries in this session
CONTEXT: SQL statement "create index on foo (b)"
PL/pgSQL function blump() line 1 at SQL statement

That prohibition is there for some reason. Someone did not just decide
to arbitrarily prohibit it. A CREATE INDEX command run in that context
won't run afoul of many of the things that might be problems in other
places -- e.g. there won't be a buffer lock held. Yet, despite the
fact that a trigger context is safe for executing a wide variety of
user-defined code, this particular operation is not allowed here. That
is the sort of thing that should worry you.

At any rate, even if this somehow were or could be made safe,
on-the-fly index creation is a feature that cannot and should not be
combined with a patch to implement global temporary tables. Surely, it
will require a lot of study and work to get the details right. And so
will GTT. As I said in the other email I wrote, this feature is hard
enough without adding this kind of thing to it. There's a reason why I
never got around to implementing this ten years ago when I did
unlogged tables; I was intending that to be a precursor to the GTT
work. I found that it was too hard and I gave up. I'm glad to see
people trying again, but the idea that we can afford to add in extra
features, or frankly that either of the dueling patches on this thread
are close to committable, is just plain wrong.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-01-29 14:56:39 Re: Enabling B-Tree deduplication by default
Previous Message amul sul 2020-01-29 14:47:49 Re: closesocket behavior in different platforms