Re: Exposing the Xact commit order to the user

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Exposing the Xact commit order to the user
Date: 2010-05-24 01:44:27
Message-ID: 4BF9D9FB.7020606@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/23/2010 8:38 PM, Robert Haas wrote:
> On Sun, May 23, 2010 at 4:21 PM, Jan Wieck <JanWieck(at)yahoo(dot)com> wrote:
>> The system will have postgresql.conf options for enabling/disabling the
>> whole shebang, how many shared buffers to allocate for managing access
>> to the data and to define the retention period of the data based on data
>> volume and/or age of the commit records.
>
> It would be nice if this could just be managed out of shared_buffers
> rather than needing to configure a separate pool just for this
> feature. But, I'm not sure how much work that is, and if it turns out
> to be too ugly then I'd say it's not a hard requirement. In general,
> I think we talked during the meeting about the desirability of folding
> specific pools into shared_buffers rather than managing them
> separately, but I'm not aware that we have any cases where we do that
> today so it might be hard (or not).

I'm not sure the retention policies of the shared buffer cache, the WAL
buffers, CLOG buffers and every other thing we try to cache are that
easy to fold into one single set of logic. But I'm all ears.

>
>> Each record of the Transaction Commit Info consists of
>>
>> txid xci_transaction_id
>> timestamptz xci_begin_timestamp
>> timestamptz xci_commit_timestamp
>> int64 xci_total_rowcount
>>
>> 32 bytes total.
>
> Are we sure it's worth including the row count? I wonder if we ought
> to leave that out and let individual clients of the mechanism track
> that if they're so inclined, especially since it won't be reliable
> anyway.

Nope, we (my belly and I) are not sure about the absolute worth of the
row count. It would be a convenient number to have there, but I can live
without it.

>
>> CommitTransaction() inside of xact.c will call a function, that inserts
>> a new record into this array. The operation will for most of the time be
>> nothing than taking a spinlock and adding the record to shared memory.
>> All the data for the record is readily available, does not require
>> further locking and can be collected locally before taking the spinlock.
>
> What happens when you need to switch pages?

Then the code will have to grab another free buffer or evict one.

>
>> The function will return the "sequence" number which CommitTransaction()
>> in turn will record in the WAL commit record together with the
>> begin_timestamp. While both, the begin as well as the commit timestamp
>> are crucial to determine what data a particular transaction should have
>> seen, the row count is not and will not be recorded in WAL.
>
> It would certainly be better if we didn't to bloat the commit xlog
> records to do this. Is there any way to avoid that?

If you can tell me how a crash recovering system can figure out what the
exact "sequence" number of the WAL commit record at hand should be,
let's rip it.

>
>> Checkpoint handling will call a function to flush the shared buffers.
>> Together with this, the information from WAL records will be sufficient
>> to recover this data (except for row counts) during crash recovery.
>
> Right.
>
>> Exposing the data will be done via a set returning function. The SRF
>> takes two arguments. The maximum number of rows to return and the last
>> serial number processed by the reader. The advantage of such SRF is that
>> the result can be used in a query that right away delivers audit or
>> replication log information in transaction commit order. The SRF can
>> return an empty set if no further transactions have committed since, or
>> an error if data segments needed to answer the request have already been
>> purged.
>>
>> Purging of the data will be possible in several different ways.
>> Autovacuum will call a function that drops segments of the data that are
>> outside the postgresql.conf configuration with respect to maximum age
>> or data volume. There will also be a function reserved for superusers to
>> explicitly purge the data up to a certain serial number.
>
> Dunno if autovacuuming this is the right way to go. Seems like that
> could leave to replication breaks, and it's also more work than not
> doing that. I'd just say that if you turn this on you're responsible
> for pruning it, full stop.

It is an option. "Keep it until I tell you" is a perfectly valid
configuration option. One you probably don't want to forget about, but
valid none the less.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2010-05-24 02:04:09 Re: Specification for Trusted PLs?
Previous Message Jan Wieck 2010-05-24 01:18:39 Re: Exposing the Xact commit order to the user