Re: pg_sequence catalog

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_sequence catalog
Date: 2016-08-31 14:35:03
Message-ID: CAMsr+YHnREoTh6XeuTzrsr8GSW_DeGoNL0DRnQi2bHRdA3OPww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31 August 2016 at 22:01, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Personally, my big beef with the current approach to sequences is that
> we eat a whole relation (including a whole relfilenode) per sequence.
> I wish that we could reduce a sequence to just a single row in a
> catalog, including the nontransactional state.

I'd be happy to see incremental improvement in this space as Peter has
suggested, though I certainly see the value of something like seqam
too.

It sounds like you're thinking of something like a normal(ish) heap
tuple where we just overwrite some fields in-place without fiddling
xmin/xmax and making a new row version. Right? Like we currently
overwrite the lone Form_pg_sequence on the 1-page sequence relations.

I initially thought that TRUNCATE ... RESTART IDENTITY would be
somewhat of a problem with this. We effectively have a temporary
"timeline" fork in the sequence value where it's provisionally
restarted and we start using values from the restarted sequence within
the xact that restarted it. But actually, it'd fit pretty well.
TRUNCATE ... RESTART IDENTITY would write a new row version with a new
xmin, and set xmax on the old sequence row. nextval(...) within the
truncating xact would update the new row's non-transactional fields
when it allocated new sequence chunks. On commit, everyone starts
using the new row due to normal transactional visibility rules. On
rollback everyone ignores it like they would any other dead tuple from
an aborted act and uses the old tuple's nontransactional fields. It
Just Works(TM).

nextval(...) takes AccessShareLock on a sequence relation. TRUNCATE
... RESTART IDENTITY takes AccessExclusiveLock. So we can never have
nextval(...) advancing the "old" timeline in other xacts at the same
time as we consume values on the restarted sequence inside the xact
that did the restarting. We still need the new "timeline" though,
because we have to retain the old value for rollback.

It feels intuitively pretty gross to effectively dirty-read and write
a few fields of a tuple. But that's what we do all the time with
xmin/xmax etc, it's not really that different.

It'd certainly make sequence decoding easier too. A LOT easier.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2016-08-31 14:40:25 Re: pg_sequence catalog
Previous Message Ivan Kartyshov 2016-08-31 14:16:13 make async slave to wait for lsn to be replayed