Re: tracking commit timestamps

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Steve Singer <steve(at)ssinger(dot)info>, Andres Freund <andres(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Subject: Re: tracking commit timestamps
Date: 2014-11-13 12:53:43
Message-ID: 5464A9D7.2010006@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www

On 13/11/14 07:04, Michael Paquier wrote:
> On Wed, Nov 12, 2014 at 10:06 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>> Brief list of changes:
>> - the commit timestamp record now stores timestamp, lsn and nodeid
> Now that not only the commit timestamp is stored, calling that "commit
> timestamp", "committs" or "commit_timestamp" is strange, no? If this
> patch is moving toward being a more complex information provider,
> calling it commit information or commit data is more adapted, no?

It's not, since adding more info will break upgrades, I doubt we will
add more anytime soon. I was thinking about it too tbh, but don't have
better name (I don't like commit data as it seems confusing - isn't
commit data the dml you just committed?).
Maybe commit_metadata, commit_information is probably ok also, this
would need input from others, I am personally fine with keeping the
commit_timestamp name too.

>
>> - if the xid passed to get interface is out of range -infinity timestamp is
>> returned (I think it's bad idea to throw errors here as the valid range is
>> not static and same ID can start throwing errors between calls
>> theoretically)
> Already mentioned by Jim in a previous mail: this would be better as NULL.

Yeah that make sense, I have no idea what I was thinking :)

>
>> - renamed the sql interfaces to pg_xact_commit_timestamp,
>> pg_xact_commit_timestamp_data and pg_last_committed_xact, they don't expose
>> the nodeid atm, I personally am not big fan of the "xact" but it seems more
>> consistent with existing naming
> pg_xact_commit_timestamp and pg_xact_commit_timestamp_data are
> overlapping. What's wrong with a single function able to return the
> whole set (node ID, commit timetamp, commit LSN)? Let's say

That's what pg_xact_commit_timestamp_data does (it does not show nodeid
because we agreed that it should not be exposed yet on sql level). Might
make sense to rename, but let's wait for input about the general
renaming point at the beginning of the mail.

> pg_xact_commit_information or pg_xact_commit_data. Already mentioned,
> but I also find using a SRF able to return all the available
> information from a given XID value quite useful. And this does not
> conflict with what is proposed currently, you would need just to call
> the function with XID + number of entries wanted to get a single one.
> Comments from other folks about that?

No idea what you mean by this to be honest, there is exactly one record
stored for single XID.

>
> Then more input about the latest patch:
> 1) This block is not needed, option -e is listed twice:
> The <option>-o</>, <option>-x</>, <option>-e</>,
> - <option>-m</>, <option>-O</>,
> - and <option>-l</>
> + <option>-m</>, <option>-O</>, <option>-l</>
> + and <option>-e</>
> 2) Very small thing: a couple of files have no newlines at the end,
> among them committs.conf and test_committs/Makefile.
> 3) pg_last_committed_xact and not pg_last_xact_commit_information or similar?

Just inspiration from DB2's rollforward (which shows among other things
"last committed transaction: <timestamp>"), but I don't feel strongly
about naming so can be changed.

> 4) storage.sgml needs to be updated with the new folder pg_committs

Right.

> 5) Er.. node ID is missing in pg_last_committed_xact, no?

That's intentional (for now).

> 6) This XXX notice can be removed:
> + /*
> + * Return empty if the requested value is older than what we have or
> + * newer than newest we have.
> + *
> + * XXX: should this be error instead?
> + */

Ok.

> (Note that I am still a partisan of an error message to let the
> caller know that commit info history does not have the information
> requested).

IMHO throwing error there would be same as throwing error when WHERE
clause in SELECT does not match anything. As the xid range for which we
store data is dynamic we need to accept any xid as valid input because
the caller has no way of validating if the xid passed is out of range or
not.

> 7) Note that TransactionTreeSetCommitTsData still never sets do_xlog
> at true and that WriteSetTimestampXlogRec never gets called. So no
> information is WAL-logged with this patch. Wouldn't that be useful for
> standbys as well? Perhaps I am missing why this is disabled? This code
> should be activated IMO or it would be just untested.

True is only needed here when you are setting this info to different
transaction than the one you are in since the info can be reconstructed
from normal transaction WAL record (see that it's actually called from
xact_redo_commit_internal, which is how we get the WAL safety and why it
works on slave). So the true is for use by extensions only, it's not
completely uncommon that we have APIs that are used only by extensions.

> 8) As a more general point, the node ID stuff makes me uncomfortable
> and is just added on top of the existing patch without much
> thinking... So I am really skeptical about it. The need here is to
> pass on demand a int8 that is a node ID that can only be set through a
> C interface, so only extensions could play with it. The data passed to
> a WAL record is always built and determined by the system and entirely
> transparent to the user, inserting user-defined data like that
> inconsistent with what we've been doing until now, no?

Again it's not exposed to SQL because I thought there was agreement to
not do that yet since we might want to build some more core stuff on top
of that before exposing it. It's part of the record now because it can
be useful already the way it is and because adding it later would break
pg_upgrade (it's int4 btw).

Also I would really not say it was added without thought, I am one of
the BDR developers and I was before one of the Londiste developers so I
did think about what I would want when in those shoes.

That being said I think I will remove the CommitTsSetDefaultNodeId
interface in next revision, as extension can already set nodeid via
TransactionTreeSetCommitTsData call and we might want to revisit the
CommitTsSetDefaultNodeId stuff once we start implementing the
replication identifiers. Not to mention that I realized in meantime that
CommitTsSetDefaultNodeId the way it's done currently isn't crash safe
(it's not hard to make it crash safe). And since it's quite simple we
can add it at later date easily if needed.

>
> Also, a question particularly for BDR and Slony folks: do you
> sometimes add a new node using the base backup of an existing node :)
> See what I come up with: a duplication of this new node ID system with
> the already present system ID, no?

Yes we do use basebackup sometimes and no it's not possible to use
systemid here:
- the point of nodeid is to be able to store *remote* nodeid as well
as local one (depending where the change actually originated from) so
your local systemid is quite useless there
- systemid is per Postgres instance, you need per-db identifier when
doing logical rep (2 dbs can have single db as destination or the other
way around)

> Similarly, the LSN is added to the WAL record containing the commit
> timestamp, but cannot the LSN of the WAL record containing the commit
> timestamp itself be used as a point of reference for a better
> ordering? That's not exactly the same as the LSN of the transaction
> commit, still it provides a WAL-based reference.

No, again for several reasons:
- as you pointed out yourself the LSN might not be same as LSN for the xid
- more importantly we normally don't do special WAL logging for commit
timestamp

How it works is that because currently the
TransactionTreeSetCommitTsData is always called with xid of the current
transaction, the WAL record for commit of current transaction can be
used to get the info we need (both timestamp and lsn are used in fact).
As I said above, see how TransactionTreeSetCommitTsData is called from
xact_redo_commit_internal.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2014-11-13 13:11:51 Re: WIP: multivariate statistics / proof of concept
Previous Message Andres Freund 2014-11-13 12:45:17 Re: Add CREATE support to event triggers

Browse pgsql-www by date

  From Date Subject
Next Message Simon Riggs 2014-11-13 13:18:22 Re: tracking commit timestamps
Previous Message Michael Paquier 2014-11-13 06:04:06 Re: tracking commit timestamps