Re: [Proposal] Global temporary tables

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, 蔡松露(子嘉) <zijia(at)taobao(dot)com>, "Cai, Le" <le(dot)cai(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>
Subject: Re: [Proposal] Global temporary tables
Date: 2020-01-30 09:33:22
Message-ID: 3bfa5b5d-6b3b-743d-573c-b85f933ba540@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29.01.2020 21:16, Robert Haas wrote:
> On Wed, Jan 29, 2020 at 10:30 AM Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>
> I think that the idea of calling ambuild() on the fly is not going to
> work, because, again, I don't think that calling that from random
> places in the code is safe.

It is not a random place in the code.
Actually it is just one place - _bt_getbuf
Why it can be unsafe if it affects only private backends data?

> What I expect we're going to need to do
> here is model this on the approach used for unlogged tables. For an
> unlogged table, each table and index has an init fork which contains
> the correct initial contents for that relation - which is nothing at
> all for a heap table, and a couple of boilerplate pages for an index.
> In the case of an unlogged table, the init forks get copied over the
> main forks after crash recovery, and then we have a brand-new, empty
> relation with brand-new empty indexes which everyone can use. In the
> case of global temporary tables, I think that we should do the same
> kind of copying, but at the time when the session first tries to
> access the table. There is some fuzziness in my mind about what
> exactly constitutes accessing the table - it probably shouldn't be
> when the relcache entry is built, because that seems too early, but
> I'm not sure what is exactly right. In any event, it's easier to find
> a context where copying some files on disk (that are certain not to be
> changing) is safe than it is to find a context where index builds are
> safe.

I do not think that approach used for unlogged tables is good for GTT.
Unlogged tables has to be reinitialized only after server restart.
GTT to should be initialized by each backend on demand.
It seems to me that init fork is used for unlogged table because
recovery process to not have enough context to be able to reintialize
table and indexes.
It is much safer and simpler for recovery process just to copy files.
But GTT case is different. Heap and indexes can be easily initialized by
backend  using existed functions.

Approach with just calling btbuild is much simpler than you propose with
creating extra forks and copying data from it.
You say that it not safe. But you have not explained why it is unsafe.
Yes, I agree that it is my responsibility to prove that it is safe.
And as I already wrote, I can not provide such proof now. I will be
pleased if you or anybody else can help to convince that this approach
is safe or demonstrate problems with this approach.

Copying data from fork doesn't help to provide the same behavior of GTT
indexes as regular indexes.
And from my point of view compatibility with regular tables is most
important point in GTT design.
If for some reasons it is not possible, than we should think about other
solutions.
But right now I do not know such problems. We have two working
prototypes of GTT. Certainly it is not mean lack of problems with the
current implementations.
But I really like to receive more constructive critics rather than "this
approach is wrong because it is unsafe".

>> planner, and the executor that remember information about indexes"
>> have to be properly updated. It is done using invalidation mechanism.
>> The same mechanism is used in case of DDL operations with GTT, because
>> we change system catalog.
> I mean, that's not really a valid argument. Invalidations can only
> take effect at certain points in the code, and the whole argument here
> is about which places in the code are safe for which operations, so
> the fact that some things (like accepting invalidations) are safe at
> some points in the code (like the places where we accept them) does
> not prove that other things (like calling ambuild) are safe at other
> points in the code (like wherever you are proposing to call it). In
> particular, if you've got a relation open, there's currently no way
> for another index to show up while you've still got that relation
> open.
The same is true for GTT. Right now building GTT index also locks the
relation.
It may be not absolutely needed, because data of relation is local and
can not be changed by some other backend.
But I have not added some special handling of GTT here.
Mostly because I want to follow the same way as with regular indexes and
prevent possible problems which as you mention can happen
if we somehow changing locking policy.

> That means that the planner and executor (which keep the
> relevant relations open) don't ever have to worry about updating their
> data structures, because it can never be necessary. It also means that
> any code anywhere in the system that keeps a lock on a relation can
> count on the list of indexes for that relation staying the same until
> it releases the lock. In fact, it can hold on to pointers to data
> allocated by the relcache and count on those pointers being stable for
> as long as it holds the lock, and RelationClearRelation contain
> specific code that aims to make sure that certain objects don't get
> deallocated and reallocated at a different address precisely for that
> reason. That code, however, only works as long as nothing actually
> changes. The upshot is that it's entirely possible for changing
> catalog entries in one backend with an inadequate lock level -- or at
> unexpected point in the code -- to cause a core dump either in that
> backend or in some other backend. This stuff is really subtle, and
> super-easy to screw up.
>
> I am speaking a bit generally here, because I haven't really studied
> *exactly* what might go wrong in the relcache, or elsewhere, as a
> result of creating an index on the fly. However, I'm very sure that a
> general appeal to invalidation messages is not sufficient to make
> something like what you want to do safe. Invalidation messages are a
> complex, ancient, under-documented, fragile system for solving a very
> specific problem that is not the one you are hoping they'll solve
> here. They could justifiably be called magic, but it's not the sort of
> magic where the fairy godmother waves her wand and solves all of your
> problems; it's more like the kind where you go explore the forbidden
> forest and are never seen or heard from again.

Actually index is not created on the fly.
Index is created is usual way, by executing "create index" command.
So all  components of the Postgres (planner, executor,...) treat GTT
indexes in the same way as regular indexes.
Locking and invalidations policies are exactly the same for them.
The only difference is that content of GTT index is constructed  on
demand using private backend data.
Is it safe or not? We are just reading data from local buffers/files and
writing them here.
May be I missed something but I do not see any unsafety here.
There are issues with updating statistic but them can be solved.

--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2020-01-30 09:44:39 Re: [Proposal] Global temporary tables
Previous Message Pavel Stehule 2020-01-30 09:23:52 Re: [Proposal] Global temporary tables