Re: New docs chapter on Transaction Management and related changes

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New docs chapter on Transaction Management and related changes
Date: 2022-11-18 19:11:50
Message-ID: Y3fY9nDysfSayXLJ@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 7, 2022 at 11:04:46PM +0100, Laurenz Albe wrote:
> On Sat, 2022-11-05 at 10:08 +0000, Simon Riggs wrote:
> > Agreed; new compilation patch attached, including mine and then
> > Robert's suggested rewordings.
>
> Thanks. There is clearly a lot of usefule information in this.

Sorry again for the long delay in replying to this.

> Some comments:
>
> > --- a/doc/src/sgml/func.sgml
> > +++ b/doc/src/sgml/func.sgml
> > @@ -24673,7 +24673,10 @@ SELECT collation for ('foo' COLLATE "de_DE");
> > <para>
> > Returns the current transaction's ID. It will assign a new one if the
> > current transaction does not have one already (because it has not
> > - performed any database updates).
> > + performed any database updates); see <xref
> > + linkend="transaction-id"/> for details. If executed in a
> > + subtransaction this will return the top-level xid; see <xref
> > + linkend="subxacts"/> for details.
> > </para></entry>
> > </row>
>
> I would use a comma after "subtransaction", and I think it would be better to write
> "transaction ID" instead of "xid".

Agreed.

> > @@ -24690,6 +24693,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
> > ID is assigned yet. (It's best to use this variant if the transaction
> > might otherwise be read-only, to avoid unnecessary consumption of an
> > XID.)
> > + If executed in a subtransaction this will return the top-level xid.
> > </para></entry>
> > </row>
>
> Same as above.

Agreed.

> > @@ -24733,6 +24737,8 @@ SELECT collation for ('foo' COLLATE "de_DE");
> > <para>
> > Returns a current <firstterm>snapshot</firstterm>, a data structure
> > showing which transaction IDs are now in-progress.
> > + Only top-level xids are included in the snapshot; subxids are not
> > + shown; see <xref linkend="subxacts"/> for details.
> > </para></entry>
> > </row>
>
> Again, I would avoid "xid" and "subxid", or at least use "transaction ID (xid)"
> and similar.

Done.

> > --- a/doc/src/sgml/ref/release_savepoint.sgml
> > +++ b/doc/src/sgml/ref/release_savepoint.sgml
> > @@ -34,23 +34,16 @@ RELEASE [ SAVEPOINT ] <replaceable>savepoint_name</replaceable>
> > <title>Description</title>
> >
> > <para>
> > - <command>RELEASE SAVEPOINT</command> destroys a savepoint previously defined
> > - in the current transaction.
> > + <command>RELEASE SAVEPOINT</command> will subcommit the subtransaction
> > + established by the named savepoint, if one exists. This will release
> > + any resources held by the subtransaction. If there were any
> > + subtransactions of the named savepoint, these will also be subcommitted.
> > </para>
> >
> > <para>
>
> "Subtransactions of the named savepoint" is somewhat confusing; how about
> "subtransactions of the subtransaction established by the named savepoint"?
>
> If that is too long and explicit, perhaps "subtransactions of that subtransaction".

This paragraph has been rewritten to:

<command>RELEASE SAVEPOINT</command> releases the named savepoint and
all active savepoints that were created after the named savepoint,
and frees their resources. All changes made since the creation of the
savepoint, excluding rolled back savepoints changes, are merged into
the transaction or savepoint that was active when the named savepoint
was created. Changes made after <command>RELEASE SAVEPOINT</command>
will also be part of this active transaction or savepoint.

> > @@ -78,7 +71,7 @@ RELEASE [ SAVEPOINT ] <replaceable>savepoint_name</replaceable>
> >
> > <para>
> > It is not possible to release a savepoint when the transaction is in
> > - an aborted state.
> > + an aborted state, to do that use <xref linkend="sql-rollback-to"/>.
> > </para>
> >
> > <para>
>
> I think the following is more English:
> "It is not possible ... state; to do that, use <xref .../>."

Changed to:

It is not possible to release a savepoint when the transaction is in
an aborted state; to do that, use <xref linkend="sql-rollback-to"/>.

> > --- a/doc/src/sgml/ref/rollback.sgml
> > +++ b/doc/src/sgml/ref/rollback.sgml
> > @@ -56,11 +56,14 @@ ROLLBACK [ WORK | TRANSACTION ] [ AND [ NO ] CHAIN ]
> > <term><literal>AND CHAIN</literal></term>
> > <listitem>
> > <para>
> > - If <literal>AND CHAIN</literal> is specified, a new transaction is
> > + If <literal>AND CHAIN</literal> is specified, a new unaborted transaction is
> > immediately started with the same transaction characteristics (see <xref
> > linkend="sql-set-transaction"/>) as the just finished one. Otherwise,
> > no new transaction is started.
>
> I don't think that is an improvement. "Unaborted" is an un-word. A new transaction
> is always "unaborted", isn't it?

Agreed.

> > --- a/doc/src/sgml/wal.sgml
> > +++ b/doc/src/sgml/wal.sgml
> > @@ -909,4 +910,36 @@
> > seem to be a problem in practice.
> > </para>
> > </sect1>
> > +
> > + <sect1 id="two-phase">
> > +
> > + <title>Two-Phase Transactions</title>
> > +
> > + <para>
> > + <productname>PostgreSQL</productname> supports a two-phase commit (2PC)
> [...]
> > + <filename>pg_twophase</filename> directory. Currently-prepared
> > + transactions can be inspected using <link
> > + linkend="view-pg-prepared-xacts"><structname>pg_prepared_xacts</structname></link>.
> > + </para>
> > + </sect1>
> > +
> > </chapter>
>
> I don't like "currently-prepared". How about:
> "Transaction that are currently prepared can be inspected..."

Yes, now:

Transactions that are currently prepared can be inspected using <link
linkend="view-pg-prepared-xacts"><structname>pg_prepared_xacts</structname></link>.

> This is clearly interesting information, but I don't think the WAL chapter is the right
> place for this. "pg_twophase" is already mentioned in "storage.sgml", and details about
> when exactly a prepared transaction is persisted may exceed the details level needed by
> the end user.
>
> I'd look for that information in the reference page for PREPARE TRANSACTION; perhaps
> that would be a better place. Or, even better, the new "xact.sgml" chapter.

Agreed, moved to xact.sgml.

> > --- /dev/null
> > +++ b/doc/src/sgml/xact.sgml
>
> + <title>Transaction Management</title>
>
> + The word transaction is often abbreviated as "xact".
>
> Should use <quote> here.

Done.

> > + <title>Transactions and Identifiers</title>
>
> > + <para>
> > + Once a transaction writes to the database, it is assigned a
> > + non-virtual <literal>TransactionId</literal> (or <type>xid</type>),
> > + e.g., <literal>278394</literal>. Xids are assigned sequentially
> > + using a global counter used by all databases within the
> > + <productname>PostgreSQL</productname> cluster. This property is used by
> > + the transaction system to order transactions by their first database
> > + write, i.e., lower-numbered xids started writing before higher-numbered
> > + xids. Of course, transactions might start in a different order.
> > + </para>
>
> "This property"? How about:
> "Because transaction IDs are assigned sequentially, the transaction system can
> use them to order transactions by their first database write"
>
> I would want some additional information here: why does the transaction system have
> to order transactions by their first database write?
>
> "Of course, transactions might start in a different order."
>
> Now that confuses me. Are you saying that BEGIN could be in a different order
> than the first database write? Perhaps like this:
>
> "Note that the order in which transactions perform their first database write
> might be different from the order in which the transactions started."

I rewrote the paragraph to be:

Non-virtual <literal>TransactionId</literal> (or <type>xid</type>),
e.g., <literal>278394</literal>, are assigned sequentially to
transactions from a global counter used by all databases within
the <productname>PostgreSQL</productname> cluster. This assignment
happens when a transaction first writes to the database. This means
lower-numbered xids started writing before higher-numbered xids.
Note that the order in which transactions perform their first database
write might be different from the order in which the transactions
started, particularly if the transaction started with statements that
only performed database reads.

> > + The internal transaction ID type <type>xid</type> is 32-bits wide
>
> There should be no hyphen in "32 bits wide", just as in "3 years old".

Done.

> > + A 32-bit epoch is incremented during each
> > + wrap around.
>
> We usually call this "wraparound" without a space.

Fixed.

> > + There is also a 64-bit type <type>xid8</type> which
> > + includes this epoch and therefore does not wrap around during the
> > + life of an installation and can be converted to xid by casting.
>
> Running "and"s. Better:
>
> "There is also ... and does not wrap ... life of an installation.
> <type>xid8</type> can be converted to <type>xid</type> by casting."

I went with:

There is also a 64-bit type <type>xid8</type> which
includes this epoch and therefore does not wrap around during the
life of an installation; it can be converted to xid by casting.

> > + Xids are used as the
> > + basis for <productname>PostgreSQL</productname>'s <link
> > + linkend="mvcc">MVCC</link> concurrency mechanism, <link
> > + linkend="hot-standby">Hot Standby</link>, and Read Replica servers.
>
> What is the difference between a hot standby and a read replica? I think
> one of these terms is sufficient.

Agreed, I went with:

Xids are used as the
basis for <productname>PostgreSQL</productname>'s <link
linkend="mvcc">MVCC</link> concurrency mechanism and streaming
replication.

> > + In addition to <literal>vxid</literal> and <type>xid</type>,
> > + when a transaction is prepared for two-phase commit it
> > + is also identified by a Global Transaction Identifier
> > + (<acronym>GID</acronym>).
>
> Better:
>
> "In addition to <literal>vxid</literal> and <type>xid</type>,
> prepared transactions also have a Global Transaction Identifier
> (<acronym>GID</acronym>) that is assigned when the transaction is
> prepared for two-phase commit."

I went with:

In addition to <literal>vxid</literal> and <type>xid</type>,
prepared transactions are also assigned Global Transaction
Identifiers (<acronym>GID</acronym>).

> > + <sect1 id="xact-locking">
> > +
> > + <title>Transactions and Locking</title>
> > +
> > + <para>
> > + Currently-executing transactions are shown in <link
> > + linkend="view-pg-locks"><structname>pg_locks</structname></link>
> > + in columns <structfield>virtualxid</structfield> and
> > + <structfield>transactionid</structfield>.
>
> Better:
>
> "The transaction IDs of currently executing transactions are shown in <link
> linkend="view-pg-locks"><structname>pg_locks</structname></link>
> in the columns <structfield>virtualxid</structfield> and
> <structfield>transactionid</structfield>."

Done.

> > + Lock waits on table-level locks are shown waiting for
> > + <structfield>virtualxid</structfield>, while lock waits on row-level
> > + locks are shown waiting for <structfield>transactionid</structfield>.
>
> That's not true. Transactions waiting for table-level locks are shown
> waiting for a "relation" lock in both "pg_stat_activity" and "pg_locks".

I tested and you are right. I went with more generic wording:

Some lock types wait on <structfield>virtualxid</structfield>,
while other types wait on <structfield>transactionid</structfield>.

> > + Row-level read and write locks are recorded directly in locked
> > + rows and can be inspected using the <xref linkend="pgrowlocks"/>
> > + extension. Row-level read locks might also require the assignment
> > + of multixact IDs (<literal>mxid</literal>). Mxids are recorded in
> > + the <filename>pg_multixact</filename> directory.
>
> "are recorded directly in *the* locked rows"

Done.

> I think the mention of multixacts should link to
> <xref linkend="vacuum-for-multixact-wraparound"/>. Again, I would not
> specifically mention the directory, since it is already described in
> "storage.sgml", but I have no strong optinion there.

Done with:

Row-level read locks might also require the assignment
of multixact IDs (<literal>mxid</literal>; see <xref
linkend="vacuum-for-multixact-wraparound"/>).

> > + <sect1 id="subxacts">
> > +
> > + <title>Subtransactions</title>
>
> > + The word subtransaction is often abbreviated as
> > + <literal>subxact</literal>.
>
> I'd use <quote>, not <literal>.

Done.

> > + If a subtransaction is assigned a non-virtual transaction ID,
> > + its transaction ID is referred to as a <literal>subxid</literal>.
>
> Again, I would use <quote>, since we don't <literal> "subxid"
> elsewhere.

Done.

> + Up to
> + 64 open subxids are cached in shared memory for each backend; after
> + that point, the overhead increases significantly since we must look
> + up subxid entries in <filename>pg_subtrans</filename>.
>
> Comma before "since". Perhaps you should mention that this means disk I/O.

I went with:

Up to 64 open subxids are cached in shared memory for each backend; after
that point, the storage I/O overhead increases significantly, since
we must look up subxid entries in <filename>pg_subtrans</filename>.

Updated full patch attached.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Indecision is a decision. Inaction is an action. Mark Batterson

Attachment Content-Type Size
xact.diff text/x-diff 23.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-11-18 19:16:33 Re: Collation version tracking for macOS
Previous Message Alvaro Herrera 2022-11-18 19:08:05 Re: MERGE regress test