Re: [Proposal] Global temporary tables

From: 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, "Erik Rijkers" <er(at)xs4all(dot)nl>, "Tomas Vondra" <tomas(dot)vondra(at)2ndquadrant(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Julien Rouhaud" <rjuju123(at)gmail(dot)com>, "Dean Rasheed" <dean(dot)a(dot)rasheed(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, 蔡松露(子嘉) <zijia(at)taobao(dot)com>, Cai, Le <le(dot)cai(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>
Subject: Re: [Proposal] Global temporary tables
Date: 2020-01-25 15:15:48
Message-ID: F37C4EF8-716C-444C-80CD-8048EDDDC887@alibaba-inc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for review patch.

> 2020年1月24日 下午4:20,Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru> 写道:
>
>
>
> On 23.01.2020 19:28, 曾文旌(义从) wrote:
>>
>> I'm trying to improve this part of the implementation in global_temporary_table_v7-pg13.patch
>> Please check my patch and give me feedback.
>>
>>
>> Thanks
>>
>> Wenjing
>>
>>
>
> Below is my short review of the patch:
>
> + /*
> + * For global temp table only
> + * use AccessExclusiveLock for ensure safety
> + */
> + {
> + {
> + "on_commit_delete_rows",
> + "global temp table on commit options",
> + RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
> + ShareUpdateExclusiveLock
> + },
> + true
> + },
>
>
> The comment seems to be confusing: it says about AccessExclusiveLock but actually uses ShareUpdateExclusiveLock.
There is a problem with the comment description, I will fix it.

>
> - Assert(TransactionIdIsNormal(onerel->rd_rel->relfrozenxid));
> - Assert(MultiXactIdIsValid(onerel->rd_rel->relminmxid));
> + Assert((RELATION_IS_GLOBAL_TEMP(onerel) && onerel->rd_rel->relfrozenxid == InvalidTransactionId) ||
> + (!RELATION_IS_GLOBAL_TEMP(onerel) && TransactionIdIsNormal(onerel->rd_rel->relfrozenxid)));
> + Assert((RELATION_IS_GLOBAL_TEMP(onerel) && onerel->rd_rel->relminmxid == InvalidMultiXactId) ||
> + (!RELATION_IS_GLOBAL_TEMP(onerel) && MultiXactIdIsValid(onerel->rd_rel->relminmxid)));
>
> It is actually equivalent to:
>
> Assert(RELATION_IS_GLOBAL_TEMP(onerel) ^ TransactionIdIsNormal(onerel->rd_rel->relfrozenxid);
> Assert(RELATION_IS_GLOBAL_TEMP(onerel) ^ MultiXactIdIsValid(onerel->rd_rel->relminmxid));
Yes, Thank you for your points out, It's simpler.

>
> + /* clean temp relation files */
> + if (max_active_gtt > 0)
> + RemovePgTempFiles();
> +
> /*
>
> I wonder why do we need some special check for GTT here.
> From my point of view cleanup at startup of local storage of temp tables should be performed in the same way for local and global temp tables.
After oom kill, In autovacuum, the Isolated local temp table will be cleaned like orphan temporary tables. The definition of local temp table is deleted with the storage file.
But GTT can not do that. So we have the this implementation in my patch.
If you have other solutions, please let me know.

>
>
> - new_rel_reltup->relfrozenxid = relfrozenxid;
> - new_rel_reltup->relminmxid = relminmxid;
> + /* global temp table not remember transaction info in catalog */
> + if (relpersistence == RELPERSISTENCE_GLOBAL_TEMP)
> + {
> + new_rel_reltup->relfrozenxid = InvalidTransactionId;
> + new_rel_reltup->relminmxid = InvalidMultiXactId;
> + }
> + else
> + {
> + new_rel_reltup->relfrozenxid = relfrozenxid;
> + new_rel_reltup->relminmxid = relminmxid;
> + }
> +
>
>
> Why do we need to do it for GTT?
> Did you check that there will be no problems with GTT in case of XID wraparound?
> Right now if you create temp table and keep session open, then it will block XID wraparound.
In my design
1 Because different sessions have different transaction information, I choose to store the transaction information of GTT in MyProc,not catalog.
2 About the XID wraparound problem, the reason is the design of the temp table storage(local temp table and global temp table) that makes it can not to do vacuum by autovacuum.
It should be completely solve at the storage level.

>
> + /* We allow to drop global temp table only this session use it */
> + if (RELATION_IS_GLOBAL_TEMP(rel))
> + {
> + if (is_other_backend_use_gtt(rel->rd_node))
> + elog(ERROR, "can not drop relation when other backend attached this global temp table");
> + }
> +
>
> Here we once again introduce incompatibility with normal (permanent) tables.
> Assume that DBA or programmer need to change format of GTT. But there are some active sessions which have used this GTT sometime in the past.
> We will not be able to drop this GTT until all this sessions are terminated.
> I do not think that it is acceptable behaviour.
In fact, The dba can still complete the DDL of the GTT.
I've provided a set of functions for this case.
If the dba needs to modify a GTT A(or drop GTT or create index on GTT), he needs to do:
1 Use the pg_gtt_attached_pids view to list the pids for the session that is using the GTT A.
2 Use pg_terminate_backend(pid)terminate they except itself.
3 Do alter GTT A.

>
> + LOCKMODE lockmode = AccessExclusiveLock;
> +
> + /* truncate global temp table only need RowExclusiveLock */
> + if (get_rel_persistence(rid) == RELPERSISTENCE_GLOBAL_TEMP)
> + lockmode = RowExclusiveLock;
>
>
> What are the reasons of using RowExclusiveLock for GTT instead of AccessExclusiveLock?
> Yes, GTT data is access only by one backend so no locking here seems to be needed at all.
> But I wonder what are the motivations/benefits of using weaker lock level here?
1 Truncate GTT deletes only the data in the session, so no need use high-level lock.
2 I think it still needs to be block by DDL of GTT, which is why I use RowExclusiveLock.

> There should be no conflicts in any case...
>
> + /* We allow to create index on global temp table only this session use it */
> + if (is_other_backend_use_gtt(heapRelation->rd_node))
> + elog(ERROR, "can not create index when have other backend attached this global temp table");
> +
>
> The same argument as in case of dropping GTT: I do not think that prohibiting DLL operations on GTT used by more than one backend is bad idea.
The idea was to give the GTT almost all the features of a regular table with few code changes.
The current version DBA can still do all DDL for GTT, I've already described.

>
> + /* global temp table not support foreign key constraint yet */
> + if (RELATION_IS_GLOBAL_TEMP(pkrel))
> + ereport(ERROR,
> + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> + errmsg("referenced relation \"%s\" is not a global temp table",
> + RelationGetRelationName(pkrel))));
> +
>
> Why do we need to prohibit foreign key constraint on GTT?
It may be possible to support FK on GTT in later versions. Before that, I need to check some code.

>
> + /*
> + * Global temp table get frozenxid from MyProc
> + * to avoid the vacuum truncate clog that gtt need.
> + */
> + if (max_active_gtt > 0)
> + {
> + TransactionId oldest_gtt_frozenxid =
> + list_all_session_gtt_frozenxids(0, NULL, NULL, NULL);
> +
> + if (TransactionIdIsNormal(oldest_gtt_frozenxid) &&
> + TransactionIdPrecedes(oldest_gtt_frozenxid, newFrozenXid))
> + {
> + ereport(WARNING,
> + (errmsg("global temp table oldest FrozenXid is far in the past"),
> + errhint("please truncate them or kill those sessions that use them.")));
> + newFrozenXid = oldest_gtt_frozenxid;
> + }
> + }
> +
>
> As far as I understand, content of GTT will never be processes by autovacuum.
> So who will update frozenxid of GTT?
> I see that up_gtt_relstats is invoked when:
> - index is created on GTT
> - GTT is truncated
> - GTT is vacuumed
> So unless GTT is explicitly vacuumed by user, its GTT is and them will not be taken in account
> when computing new frozen xid value. Autovacumm will produce this warnings (which will ton be visible by end user and only appended to the log).
> And at some moment of time wrap around happen and if there still some old active GTT, we will get incorrect results.
I have already described my point in previous emails.

1. The core problem is that the data contains transaction information (xid), which needs to be vacuum(freeze) regularly to avoid running out of xid.
The autovacuum supports vacuum regular table but local temp does not. autovacuum also does not support GTT.

2. However, the difference between the local temp table and the global temp table(GTT) is that
a) For local temp table: one table hava one piece of data. the frozenxid of one local temp table is store in the catalog(pg_class).
b) For global temp table: each session has a separate copy of data, one GTT may contain maxbackend frozenxid.
and I don't think it's a good idea to keep frozenxid of GTT in the catalog(pg_class).
It becomes a question: how to handle GTT transaction information?

I agree that problem 1 should be completely solved by a some feature, such as local transactions. It is definitely not included in the GTT patch.
But, I think we need to ensure the durability of GTT data. For example, data in GTT cannot be lost due to the clog being cleaned up. It belongs to problem 2.

For problem 2
If we ignore the frozenxid of GTT, when vacuum truncates the clog that GTT need, the GTT data in some sessions is completely lost.
Perhaps we could consider let aotuvacuum terminate those sessions that contain "too old" data,
But It's not very friendly, so I didn't choose to implement it in the first version.
Maybe you have a better idea.

Wenjing

>
>
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com <http://www.postgrespro.com/>
> The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sehrope Sarkuni 2020-01-25 16:35:23 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Previous Message Magnus Hagander 2020-01-25 14:43:41 Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)