Re: [RFC] extended txid docs

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: [RFC] extended txid docs
Date: 2007-10-16 20:24:30
Message-ID: 60ejfulse9.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

markokr(at)gmail(dot)com ("Marko Kreen") writes:
> Even the realistic code may be too much for general docs,
> but considering this is not a functionality covered
> by general SQL textbooks, I think it is worth having.
>
> I also put rendered pages up here:
>
> http://skytools.projects.postgresql.org/txid/datatype-txid-snapshot.html

"The data type txid_snapshot stores info about what transaction ids
are visible in a particular moment of time. Components are described
in..."

I'd suggest instead:

"The data type txid_snapshot stores info about transaction ID
visibility at a particular moment in time. The components are
described in..."

"Smallest txid that may be active. Below it all txids are visible."

I'd suggest instead:

"Earliest transaction ID that is still active. All earlier
transactions will either be committed and visible, or rolled back and
dead."

"Next unassigned txid. Above it all txids are unassigned, thus invisible."

I'd suggest instead:

"Next unassigned txid. All txids later than this one are unassigned,
and thus invisible."

> http://skytools.projects.postgresql.org/txid/functions-txid.html

"The main use of the functions comes from the fact that user can query txids that were committed between 2 snapshots. As this is slightly tricky, it is described here in details on the example of simple queue table."

I'd suggest instead:

"The main use of the functions is to determine which transactions were
committed between 2 snapshots. As this is somewhat tricky, a
demonstration of their use with a simple queue table is provided."

"Then let there be table for snapshots, into which a separate process
inserts a row with current snapshot after each 5 seconds (for
example). Lets call it 'ticks' table:"

I'd suggest instead:

"We define a table to store snapshots, called 'ticks', into which a
separate process inserts a row indicating a current transaction
snapshot every 5 seconds."

"Now if someone wants to read events from the queue table, then at
first he needs to get 2 rows with snapshots from ticks table, then
query for txids that were committed between those 2 snapshots on
events table.

Because the txids and snapshots are tied to PostgreSQL internal MVCC
mechanism, the reader can be certain that the txid range queried stays
constant."

I'd suggest instead:

"In order to consistently read event data for a particular period,
then first the user must read 2 rows from the 'ticks' table that
indicate, between them, transaction visibility information, and then
search the event table for the txids that were committed between those
2 snapshots.

Since the txid and snapshot values are tied to PostgreSQL's internal
MVCC mechanism, the reader may be certain that the txid range queried
is consistent."

"But it will have problems if there are long transactions
running. That means the snap1.xmin will stay at the position of
running transaction and the range will get very large.

This can be fixed by fetching only [snap1.xmax..snap2.xmax] by range
and fetching possible txids below snap1.xmax explicitly:"

I'd suggest instead:

"But the query may be processed inefficiently if there are
long-running transactions during the period. That would have the
result that the snap1.xmin value would continue to refer to the
elderly running transaction, and the range will grow very large.

This may be rectified by fetching only [snap1.xmax..snap2.xmax] by
range and, and fetching candidate txids earlier than snap1.xmax
explicitly:"

"But that is also slightly inefficient as long transactions can be open during several snapshots. So it would be good to pick out exact transactions that were open at the time of snap1 and committed before snap2. That can be done with following query:"

I'd suggest instead:

"But that query is also somewhat inefficient because long-running
transactions may be open across multiple snapshots. As a result, it
may be more efficient to pick out exact transactions that were open at
the time of snap1 and committed before snap2. That can be done with
following query:"

"As txids returned by last query are certainly interesting, their visiblity does not need additional checks. That means the final query can be in form:"

I'd suggest instead:

"As txids returned by that last query are certainly of interest,
visibility checking does not require additional checks. That means
the final query may of the form:"

"Although the above queries are technically correct, PostgreSQL fails to plan them efficiently. The actual query should always be made with actual values written in."

I'd suggest instead:

"Although of the above queries are all technically correct, PostgreSQL
will not plan them efficiently unless specific values are used. The
actual query should always be executed using specific values."

I believe that those suggested texts describe what you intended, and
they should represent better English text for this.
--
let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];;
http://www3.sympatico.ca/cbbrowne/spreadsheets.html
"What you said you want to do is roughly equivalent to nailing
horseshoes to the tires of your Buick." -- danceswithcrows(at)usa(dot)net on
the question "Why can't Linux use Windows Drivers?"

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2007-10-16 20:37:14 Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.
Previous Message Dave Page 2007-10-16 20:20:53 Re: [COMMITTERS] pgsql: Re-allow UTF8 encodings on win32.

Browse pgsql-patches by date

  From Date Subject
Next Message ITAGAKI Takahiro 2007-10-17 06:39:02 Patch for testing query modes on pgbench
Previous Message Marko Kreen 2007-10-16 15:29:34 [RFC] extended txid docs