Re: Exposing the Xact commit order to the user

From: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Exposing the Xact commit order to the user
Date: 2010-05-24 19:10:21
Message-ID: 20100524191021.GA53044@csail.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 23, 2010 at 04:21:58PM -0400, Jan Wieck wrote:
> In some systems (data warehousing, replication), the order of commits is
> important, since that is the order in which changes have become visible.
> This information could theoretically be extracted from the WAL, but
> scanning the entire WAL just to extract this tidbit of information would
> be excruciatingly painful.

This is very interesting to me as I've been doing some (research --
nowhere near production-level) work on building a transactional
application-level (i.e. memcached-like) cache atop Postgres. One of the
features I needed to support it was basically what you describe.

Without getting too far into the details of what I'm doing, I needed to
make it clear to a higher layer which commits were visible to a given
query. That is, I wanted to know both the order of commits and where
particular snapshots fit into this ordering. (A SnapshotData struct
obviously contains the visibility information, but a representation in
terms of the commit ordering is both more succinct and allows for easy
ordering comparisons).

Something you might want to consider, then, is adding an interface to
find out the timestamp of the current transaction's snapshot, i.e. the
timestamp of the most recent committed transaction visible to it. I
wouldn't expect this to be difficult to implement as transaction
completion/visibility is already synchronized via ProcArrayLock.

> Each record of the Transaction Commit Info consists of
>
> txid xci_transaction_id
> timestamptz xci_begin_timestamp
> timestamptz xci_commit_timestamp
> int64 xci_total_rowcount

Another piece of information that seems useful to provide here would be
the logical timestamp of the transaction, i.e. a counter that's
incremented by one for each transaction. But maybe that's implicit in
the log ordering?

I'm not clear on why the total rowcount is useful, but perhaps I'm
missing something obvious.

I've actually implemented some semblance of this on Postgres 8.2, but it
sounds like what you're interested in is more sophisticated. In
particular, I wasn't at all concerned with durability or WAL stuff, and
I had some specific requirements about when it was OK to purge the
data. Because of this (and very limited development time), I just
threw something together with a simple shared buffer.

I don't think I have any useful code to offer, but let me know if
there's some way I can help out.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2010-05-24 19:11:11 Re: ExecutorCheckPerms() hook
Previous Message Alvaro Herrera 2010-05-24 19:07:22 Re: Exposing the Xact commit order to the user