Re: [Proposal] Global temporary tables

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, 蔡松露(子嘉) <zijia(at)taobao(dot)com>, "Cai, Le" <le(dot)cai(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>
Subject: Re: [Proposal] Global temporary tables
Date: 2020-02-04 21:38:09
Message-ID: CA+TgmoY=aP4p_AR=z4O_kfMFHoSEg19Ed+f=U+6_0W1HKfJ2-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 3, 2020 at 3:08 AM Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> Thank you for explanation.
> You convinced me that building indexes from _bt_getbuf is not good idea.
> What do you think about idea to check and build indexes for GTT prior to
> query execution?
>
> In this case we do not need to patch code of all indexes - it can be
> done just in one place.
> We can use build function of access method to initialize index and
> populate it with data.
>
> So right now when building query execution plan, optimizer checks if
> index is valid.
> If index belongs to GTT, it an check that first page of the index is
> initialized and if not - call build method for this index.
>
> If building index during building query plan is not desirable, we can
> just construct list of indexes which should be checked and
> perform check itself and building indexes somewhere after building plan
> but for execution of the query.
>
> Do you seem some problems with such approach?

My guess it that the right time to do this work is just after we
acquire locks, at the end of parse analysis. I think trying to do it
during execution is too late, since the planner looks at indexes, and
trying to do it in the planner instead of before we start planning
seems more likely to cause bugs and has no real advantages. It's just
better to do complicated things (like creating indexes) separately
rather than in the middle of some other complicated thing (like
planning). I could tie my shoelaces the first time they get tangled up
with my break pedal but it's better to do it before I get in the car.

And I'm still inclined to do it by flat-copying files rather than
calling ambuild. It will be slightly faster, but also importantly, it
will guarantee that (1) every backend gets exactly the same initial
state and (2) it has fewer ways to fail because it doesn't involve
calling any user-defined code. Those seem like fairly compelling
advantages, and I don't see what the disadvantages are. I think
calling ambuild() at the point in time proposed in the preceding
paragraph would be fairly safe and would probably work OK most of the
time, but I can't think of any reason it would be better.

Incidentally, what I'd be inclined to do is - if the session is
running a query that does only read-only operations, let it continue
to point to the "master" copy of the GTT and its indexes, which is
stored in the relfilenodes indicated for those relations in pg_class.
If it's going to acquire a lock heavier than AccessShareLock, then
give it is own copies of the table and indexes, stored in a temporary
relfilenode (tXXX_YYY) and redirect all future access to that GTT by
this backend to there. Maybe there's some reason this won't work, but
it seems nice to avoid saying that we've "attached" to the GTT if all
we did is read the empty table.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-02-04 21:52:55 Re: Complete data erasure
Previous Message Tomas Vondra 2020-02-04 21:29:00 Re: Complete data erasure