Re: [Proposal] Global temporary tables

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, 蔡松露(子嘉) <zijia(at)taobao(dot)com>, "Cai, Le" <le(dot)cai(at)alibaba-inc(dot)com>, 张广舟(明虚) <guangzhou(dot)zgz(at)alibaba-inc(dot)com>, 赵殿奎 <diankui(dot)zdk(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>
Subject: Re: [Proposal] Global temporary tables
Date: 2019-10-28 13:37:50
Message-ID: 8dc812ea-af3f-7a9d-ae9a-d89c25ca89ca@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28.10.2019 15:07, Robert Haas wrote:
> On Fri, Oct 25, 2019 at 11:14 AM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>>> Access to GTT at replica:
>>>> 1. Access is prohibited (as for original temp tables). No changes at all.
>>>> 2. Tuples of temp tables are marked with forzen XID. Minimal changes, rollbacks are not possible.
>>>> 3. Providing special XIDs for GTT at replica. No changes in CLOG are required, but special MVCC visibility rules are used for GTT. Current limitation: number of transactions accessing GTT at replica is limited by 2^32
>>>> and bitmap of correspondent size has to be maintained (tuples of GTT are not proceeded by vacuum and not frozen, so XID horizon never moved).
>>> I again vote for #1. A GTT is defined to allow data to be visible only
>>> within one session -- so what does it even mean for the data to be
>>> accessible on a replica?
>> why not? there are lot of sessions on replica servers. One usage of temp tables is fixing estimation errors. You can create temp table with partial query result, run ANALYZE and evaluate other steps. Now this case is not possible on replica servers.
>>
>> One motivation for GTT is decreasing port costs from Oracle. But other motivations, like do more complex calculations on replica are valid and valuable.
> Hmm, I think I was slightly confused when I wrote my previous
> response. I now see that what was under discussion was not making data
> from the master visible on the standbys, which really wouldn't make
> any sense, but rather allowing standby sessions to also use the GTT,
> each with its own local copy of the data. I don't think that's a bad
> feature, but look how invasive the required changes are. Not allowing
> rollbacks seems dead on arrival; an abort would be able to leave the
> table and index mutually inconsistent. A separate XID space would be
> a real solution, perhaps, but it would be *extremely* complicated and
> invasive to implement.

Sorry, but both statements are not true.
As I mentioned before, I have implemented both solutions.

I am not sure how vital is lack of aborts for transactions working with
GTT at replica.
Some people said that there is no sense in aborts of read-only
transactions at replica (despite to the fact that them are saving
intermediate results in GTT).
Some people said something similar with your's "dead on arrival".
But inconsistency is not possible: if such transaction is really
aborted, then backend is terminated and nobody can see this inconsistency.

Concerning second alternative: you can check yourself that it is not
*extremely* complicated and invasive.
I extracted changes which are related with handling transactions at
replica and attached them to this mail.
It is just 500 lines (including diff contexts). Certainly there are some
limitation of this implementation: number of  transactions working with
GTT at replica is limited by 2^32
and since GTT tuples are not frozen, analog of GTT CLOG kept in memory
is never truncated.

>
> One thing that I've learned over and over again as a developer is that
> you get a lot more done if you tackle one problem at a time. GTTs are
> a sufficiently-large problem all by themselves; a major reworking of
> the way XIDs work might be a good project to undertake at some point,
> but it doesn't make any sense to incorporate that into the GTT
> project, which is otherwise about a mostly-separate set of issues.
> Let's not try to solve more problems at once than strictly necessary.
>
I agree with it and think that implementation of GTT with private
buffers and no replica access is good starting point.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
replica-gtt-xact.diff text/x-patch 18.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-10-28 13:45:54 Re: Remove one use of IDENT_USERNAME_MAX
Previous Message Fabien COELHO 2019-10-28 13:36:15 Re: pgbench - extend initialization phase control