Re: Sequence Access Method WIP

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Sequence Access Method WIP
Date: 2015-04-20 15:50:44
Message-ID: 55352054.5020408@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/15/2015 09:07 PM, Petr Jelinek wrote:
> Slightly updated version of the patch.
>
> Mainly rebased against current master (there were several conflicts) and
> fixed some typos, no real functional change.
>
> I also attached initial version of the API sgml doc.

I finally got around to have another round of review on this. I fixed a
couple of little bugs, did some minor edition on comments etc. See
attached. It is also available in my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch "seqam",
if you want to look at individual changes. It combines your patches 1
and 4, I think those need to be applied together. I haven't looked at
the DDL changes yet.

I'm fairly happy with the alloc API now. I'm not sure it's a good idea
for the AM to access the sequence tuple directly, though. I would seem
cleaner if it only manipulated the amdata Datum. But it's not too bad as
it is.

We went back and forth on whether 'amdata' should be a single column or
multiple columns, but I guess the single bytea column was the consensus
in the end.

The division of labour between sequence.c and the AM, in the init and
the get/set_state functions, is a bit more foggy:

* Do we really need a separate amoptions() method and an init() method,
when the only caller to amoptions() is just before the init() method?
The changes in extractRelOptions suggest that it would call
am_reloptions for sequences too, but that doesn't actually seem to be
happening.

postgres=# create sequence fooseq using local with (garbage=option);
CREATE SEQUENCE

Invalid options probably should throw an error.

* Currently, the AM's init function is responsible for basic sanity
checking, like min < max. It also has to extract the RESTART value from
the list of options. I think it would make more sense to move that to
sequence.c. The AM should only be responsible for initializing the
'amdata' portion, and for handling any AM-specific options. If the AM
doesn't support some options, like MIN and MAX value, it can check them
and throw an error, but it shouldn't be responsible for just passing
them through from the arguments to the sequence tuple.

* It might be better to form the sequence tuple before calling the init
function, and pass the ready-made tuple to it. All the other API
functions deal with the tuple (after calling sequence_read_tuple), so
that would be more consistent. The init function would need to
deconstruct it to modify it, but we're not concerned about performance here.

* The transformations of the arrays in get_state() and set_state()
functions are a bit complicated. The seqam_get_state() function returns
two C arrays, and pg_sequence_get_state() turns them into a text[]
array. Why not construct the text[] array directly in the AM? I guess
it's a bit more convenient to the AM, if the pg_sequence_get_state() do
that, but if that's an issue, you could create generic helper function
to turn two C string arrays into text[], and call that from the AM.

> seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh));
>
> for (i = 0; i < count; i++)
> {
> if (pg_strcasecmp(keys[i], "last_value") == 0)
> seq->last_value = DatumGetInt64(DirectFunctionCall1(int8in,
> CStringGetDatum(values[i])));
> else if (pg_strcasecmp(keys[i], "is_called") == 0)
> seq->is_called = DatumGetBool(DirectFunctionCall1(boolin,
> CStringGetDatum(values[i])));
> else
> ereport(ERROR,
> (errcode(ERRCODE_SYNTAX_ERROR),
> errmsg("invalid state key \"%s\" for local sequence",
> keys[i])));
> }
>
> sequence_save_tuple(seqh, NULL, true);

If that error happens after having already processed a "last_value" or
"is_called" entry, you have already modified the on-disk tuple. AFAICS
that's the only instance of that bug, but sequence_read_tuple - modify
tuple in place - sequence_save_tuple pattern is quite unsafe in general.
If you modify a tuple directly in a Buffer, you should have a critical
section around it. It would make sense to start a critical section in
sequence_read_tuple(), except that not all callers want to modify the
tuple in place. Perhaps the sequence_read_tuple/sequence_save_tuple
functions should be split into two:

sequence_read_tuple_for_update()
sequence_save_tuple()

* seqam_local_get_state() calls sequence_read_tuple() but not
sequence_release_tuple(). It looks like sequence_close() releases the
tuple and the buffer, but that seems sloppy. sequence_close() probably
shouldn't be calling sequence_release_tuple at all, so that you'd get a
warning at the end of transaction about the buffer leak.

- Heikki

Attachment Content-Type Size
seqam-v8+api-doc-heikki.patch application/x-patch 134.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-04-20 16:11:21 Re: Clock sweep not caching enough B-Tree leaf pages?
Previous Message Merlin Moncure 2015-04-20 15:00:43 Re: Clock sweep not caching enough B-Tree leaf pages?