Re: [Proposal] Global temporary tables

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, 蔡松露(子嘉) <zijia(at)taobao(dot)com>, "Cai, Le" <le(dot)cai(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>
Subject: Re: [Proposal] Global temporary tables
Date: 2020-01-27 09:38:13
Message-ID: a9ab4d2c-0778-1865-0c39-b89679d5f277@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25.01.2020 18:15, 曾文旌(义从) wrote:
> I wonder why do we need some special check for GTT here.
>> From my point of view cleanup at startup of local storage of temp
>> tables should be performed in the same way for local and global temp
>> tables.
> After oom kill, In autovacuum, the Isolated local temp table will be
> cleaned like orphan temporary tables. The definition of local temp
> table is deleted with the storage file.
> But GTT can not do that. So we have the this implementation in my patch.
> If you have other solutions, please let me know.
>
I wonder if it is possible that autovacuum or some other Postgres
process is killed by OOM and postmaster is not noticing it can doens't
restart Postgres instance?
as far as I know, crash of any process connected to Postgres shared
memory (and autovacuum definitely has such connection) cause Postgres
restart.

> In my design
> 1 Because different sessions have different transaction information, I
> choose to store the transaction information of GTT in MyProc,not catalog.
> 2 About the XID wraparound problem, the reason is the design of the
> temp table storage(local temp table and global temp table) that makes
> it can not to do vacuum by autovacuum.
> It should be completely solve at the storage level.
>

My point of view is that vacuuming of temp tables is common problem for
local and global temp tables.
So it has to be addressed in the common way and so we should not try to
fix this problem only for GTT.

> In fact, The dba can still complete the DDL of the GTT.
> I've provided a set of functions for this case.
> If the dba needs to modify a GTT A(or drop GTT or create index on
> GTT), he needs to do:
> 1 Use the pg_gtt_attached_pids view to list the pids for the session
> that is using the GTT A.
> 2 Use pg_terminate_backend(pid)terminate they except itself.
> 3 Do alter GTT A.
>
IMHO forced terminated of client sessions is not acceptable solution.
And it is not an absolutely necessary requirement.
So from my point of view we should not add such limitations to GTT design.

>>
>> What are the reasons of using RowExclusiveLock for GTT instead of
>> AccessExclusiveLock?
>> Yes, GTT data is access only by one backend so no locking here seems
>> to be needed at all.
>> But I wonder what are the motivations/benefits of using weaker lock
>> level here?
> 1 Truncate GTT deletes only the data in the session, so no need use
> high-level lock.
> 2 I think it still needs to be block by DDL of GTT, which is why I use
> RowExclusiveLock.

Sorry, I do not understand your arguments: we do not need exclusive lock
because we drop only local (private) data
but we need some kind of lock. I agree with 1) and not 2).

>
>> There should be no conflicts in any case...
>>
>> +        /* We allow to create index on global temp table only this
>> session use it */
>> +        if (is_other_backend_use_gtt(heapRelation->rd_node))
>> +            elog(ERROR, "can not create index when have other
>> backend attached this global temp table");
>> +
>>
>> The same argument as in case of dropping GTT: I do not think that
>> prohibiting DLL operations on GTT used by more than one backend is
>> bad idea.
> The idea was to give the GTT almost all the features of a regular
> table with few code changes.
> The current version DBA can still do all DDL for GTT, I've already
> described.

I absolutely agree with you that GTT should be given the same features
as regular tables.
The irony is that this most natural and convenient behavior is most easy
to implement without putting some extra restrictions.
Just let indexes for GTT be constructed on demand. It it can be done
using the same function used for regular index creation.

>
>>
>> +    /* global temp table not support foreign key constraint yet */
>> +    if (RELATION_IS_GLOBAL_TEMP(pkrel))
>> +        ereport(ERROR,
>> +                (errcode(ERRCODE_WRONG_OBJECT_TYPE),
>> +                 errmsg("referenced relation \"%s\" is not a global
>> temp table",
>> + RelationGetRelationName(pkrel))));
>> +
>>
>> Why do we need to prohibit foreign key constraint on GTT?
> It may be possible to support FK on GTT in later versions. Before
> that, I need to check some code.

Ok,  may be approach to prohibit everything except minimally required
functionality  is safe and reliable.
But frankly speaking I prefer different approach: if I do not see any
contradictions of new feature with existed operations
and it is passing tests, then we should  not prohibit this operations
for new feature.

> I have already described my point in previous emails.
>
> 1. The core problem is that the data contains transaction information
> (xid), which needs to be vacuum(freeze) regularly to avoid running out
> of xid.
> The autovacuum supports vacuum regular table but local temp does not.
> autovacuum also does not support GTT.
>
> 2. However, the difference between the local temp table and the global
> temp table(GTT) is that
> a) For local temp table: one table hava one piece of data. the
> frozenxid of one local temp table is store in the catalog(pg_class).
> b) For global temp table: each session has a separate copy of data,
> one GTT may contain maxbackend frozenxid.
> and I don't think it's a good idea to keep frozenxid of GTT in the
> catalog(pg_class).
> It becomes a question: how to handle GTT transaction information?
>
> I agree that problem 1 should be completely solved by a some feature,
> such as local transactions. It is definitely not included in the GTT
> patch.
> But, I think we need to ensure the durability of GTT data. For
> example, data in GTT cannot be lost due to the clog being cleaned up.
> It belongs to problem 2.
>
> For problem 2
> If we ignore the frozenxid of GTT, when vacuum truncates the clog that
> GTT need, the GTT data in some sessions is completely lost.
> Perhaps we could consider let aotuvacuum terminate those sessions that
> contain "too old" data,
> But It's not very friendly, so I didn't choose to implement it in the
> first version.
> Maybe you have a better idea.

Sorry, I do not have better idea.
I prefer not to address this problem in first version of the patch at all.
fozen_xid of temp table is never changed unless user explicitly invoke
vacuum on it.
I do not think that anybody is doing it (because it accentually contains
temporary data which is not expected to live long time.
Certainly it is possible to imagine situation when session use GTT to
store some local data which is valid during all session life time (which
can be large enough).
But I am not sure that it is popular scenario.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuro Yamada 2020-01-27 10:16:25 Re: progress report for ANALYZE
Previous Message Peter Eisentraut 2020-01-27 09:19:51 Re: Allow an alias to be attached directly to a JOIN ... USING