Re: Sequence Access Methods, round two

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Sequence Access Methods, round two
Date: 2024-02-22 16:36:00
Message-ID: ef7114ee-e629-4132-b92d-d9104e2a40d7@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael,

I took a quick look at this patch series, mostly to understand how it
works and how it might interact with the logical decoding patches
discussed in a nearby thread.

First, some general review comments:

0001
------

I think this bit in pg_proc.dat is not quite right:

proallargtypes => '{regclass,bool,int8}', proargmodes => '{i,o,o}',
proargnames => '{seqname,is_called,last_value}',

the first argument should not be "seqname" but rather "seqid".

0002, 0003
------------
seems fine, cosmetic changes

0004
------

I don't understand this bit in AlterSequence:

last_value = newdataform->last_value;
is_called = newdataform->is_called;

UnlockReleaseBuffer(buf);

/* Check and set new values */
init_params(pstate, stmt->options, stmt->for_identity, false,
seqform, &last_value, &reset_state, &is_called,
&need_seq_rewrite, &owned_by);

Why set the values only to pass them to init_params(), which will just
overwrite them anyway? Or do I get this wrong?

Also, isn't "reset_state" just a different name for (original) log_cnt?

0005
------

I don't quite understand what "elt" stands for :-(

stmt->tableElts = NIL;

Do we need AT_AddColumnToSequence? It seems to work exactly like
AT_AddColumn. OTOH we do have AT_AddColumnToView too ...

Thinking about this code:

case T_CreateSeqStmt:
EventTriggerAlterTableStart(parsetree);
address = DefineSequence(pstate, (CreateSeqStmt *) parsetree);
/* stashed internally */
commandCollected = true;
EventTriggerAlterTableEnd();
break;

Does this actually make sense? I mean, are sequences really relations?
Or was that just a side effect of storing the state in a heap table
(which is more of an implementation detail)?

0006
------
no comment, just moving code

0007
------
I wonder why heap_create_with_catalog needs to do this (check that it's
a sequence):

if ((RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE) ||
relkind == RELKIND_SEQUENCE)

Presumably this is to handle sequences that use heap to store the state?
Maybe the comment should explain that. Also, will the other table AMs
need to do something similar, just in case some sequence happens to use
that table AM (which seems out of control of the table AM)?

I don't understand why DefineSequence need to copy the string:

stmt->accessMethod = seq->accessMethod ? pstrdup(seq->accessMethod)
: NULL;

RelationInitTableAccessMethod now does not need to handle sequences, or
rather should not be asked to handle sequences. Is there a risk we'd
pass a sequence to the function anyway? Maybe an assert / error would be
appropriate?

This bit in RelationBuildLocalRelation looks a bit weird ...

if (RELKIND_HAS_TABLE_AM(relkind))
RelationInitTableAccessMethod(rel);
else if (relkind == RELKIND_SEQUENCE)
RelationInitSequenceAccessMethod(rel);

It's not a fault of this patch, but shouldn't we now have something like
RELKIND_HAS_SEQUENCE_AM()?

0008-0010
-----------
no comment

logical decoding / replication
--------------------------------
Now, regarding the logical decoding / replication, would introducing the
sequence AM interfere with that in some way? Either in general, or with
respect to the nearby patch.

That is, what would it take to support logical replication of sequences
with some custom sequence AM? I believe that requires (a) synchronizing
the initial value, and (b) decoding the sequence WAL and (c) apply the
decoded changes. I don't think the sequence AM breaks any of this, as
long as it allows selecting "current value", decoding the values from
WAL, sending them to the subscriber, etc.

I guess the decoding would be up to the RMGR, and this patch maintains
the 1:1 mapping of sequences to relfilenodes, right? (That is, CREATE
and ALTER SEQUENCE would still create a new relfilenode, which is rather
important to decide if a sequence change is transactional.)

It seems to me this does not change the non-transactional behavior of
sequences, right?

alternative sequence AMs
--------------------------
I understand one of the reasons for adding sequence AMs is to allow
stuff like global/distributed sequences, etc. But will people actually
use that?

For example, I believe Simon originally proposed this in 2016 because
the plan was to implement distributed sequences in BDR on top of it. But
I believe BDR ultimately went with a very different approach, not with
custom sequence AMs. So I'm a bit skeptical this being suitable for
other active-active systems ...

Especially when the general consensus seems to be that for active-active
systems it's much better to use e.g. UUID, because that does not require
any coordination between the nodes, etc.

I'm not claiming there are no use cases for sequence AMs, of course. For
example the PRNG-based sequences mentioned by Mattias seems interesting.
I don't know how widely useful that is, though, and if it's worth it
(considering they managed to implement it in a different way).

But I think it might be a good idea to implement a PoC of such sequence
AM, if only to verify it can be implemented using the proposed code.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message a.imamov 2024-02-22 16:54:37 Potential issue in ecpg-informix decimal converting functions
Previous Message vignesh C 2024-02-22 16:29:27 Re: speed up a logical replica setup