Re: Sequence Access Method WIP

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: hlinnaka(at)iki(dot)fi, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Sequence Access Method WIP
Date: 2015-04-27 15:49:27
Message-ID: 553E5A87.9030105@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22/04/15 22:01, Petr Jelinek wrote:
> On 20/04/15 17:50, Heikki Linnakangas wrote:
>>>
>> I finally got around to have another round of review on this. I fixed a
>> couple of little bugs, did some minor edition on comments etc. See
>> attached. It is also available in my git repository at
>> git://git.postgresql.org/git/users/heikki/postgres.git, branch "seqam",
>> if you want to look at individual changes. It combines your patches 1
>> and 4, I think those need to be applied together. I haven't looked at
>> the DDL changes yet.
>
> Thanks!

I merged all your changes and merged the patch#1 with patch#4.

Also I pushed my repo to https://github.com/PJMODOS/postgres/tree/seqam
and gave you (Heikki) commit rights there in case you want to change
anything.

>> The division of labour between sequence.c and the AM, in the init and
>> the get/set_state functions, is a bit more foggy:
>>
>> * Do we really need a separate amoptions() method and an init() method,
>> when the only caller to amoptions() is just before the init() method?
>> The changes in extractRelOptions suggest that it would call
>> am_reloptions for sequences too, but that doesn't actually seem to be
>> happening.
>
> Hmm yes it should and I am sure it did at some point, must have messed
> it during one of the rebases :(
>
> And it's the reason why we need separate API function.
>

Actually the reloption handling was broken in more than one place, I
fixed it all around. I had to teach the heap_create_with_catalog about
relam but the change does not seem to be too intrusive to me so I think
it's fine. We no longer do the update of pg_class tuple after the
sequence was created anymore and the relcache works correctly now. Also
the whole reloption handling was moved to more proper places than the
seqam_init so it's more in line with how it works for other relation kinds.

>>
>> postgres=# create sequence fooseq using local with (garbage=option);
>> CREATE SEQUENCE
>>
>> Invalid options probably should throw an error.
>>

Done, well actually the local sequence now throws error on any options.
I also updated CREATE SEQUENCE and ALTER SEQUENCE docs with the
reloptions syntax.

>> * Currently, the AM's init function is responsible for basic sanity
>> checking, like min < max. It also has to extract the RESTART value from
>> the list of options. I think it would make more sense to move that to
>> sequence.c. The AM should only be responsible for initializing the
>> 'amdata' portion, and for handling any AM-specific options. If the AM
>> doesn't support some options, like MIN and MAX value, it can check them
>> and throw an error, but it shouldn't be responsible for just passing
>> them through from the arguments to the sequence tuple.
>

I now do the checking in sequence.c where possible, the init got
generally redone to accept the newly created tuple and restart value
parameters.

One thing that I am not sure how much like is that if the AM wants to
change the amdata in the init, it now has to modify the tuple and free
the old one.

>
>>
>> * The transformations of the arrays in get_state() and set_state()
>> functions are a bit complicated. The seqam_get_state() function returns
>> two C arrays, and pg_sequence_get_state() turns them into a text[]
>> array. Why not construct the text[] array directly in the AM? I guess
>> it's a bit more convenient to the AM, if the pg_sequence_get_state() do
>> that, but if that's an issue, you could create generic helper function
>> to turn two C string arrays into text[], and call that from the AM.
>
> Yeah that was exactly the reasoning. Helper function works for me (will
> check what Álvaro's suggested, maybe it can be moved somewhere and reused).
>

Didn't use Álvaro's code in the end as ISTM working directly with arrays
is simple enough for this use case. The only slightly ugly part is the
use of TextDatumGetCString/CStringGetDatum but I think that's survivable
(maybe the sequence.c could convert the TEXT[] to CSTRING[] or just char
*[] as this is not performance critical code path).

>>
>>> seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh));
>>>
>>> for (i = 0; i < count; i++)
>>> {
>>> if (pg_strcasecmp(keys[i], "last_value") == 0)
>>> seq->last_value = DatumGetInt64(DirectFunctionCall1(int8in,
>>>
>>> CStringGetDatum(values[i])));
>>> else if (pg_strcasecmp(keys[i], "is_called") == 0)
>>> seq->is_called = DatumGetBool(DirectFunctionCall1(boolin,
>>>
>>> CStringGetDatum(values[i])));
>>> else
>>> ereport(ERROR,
>>> (errcode(ERRCODE_SYNTAX_ERROR),
>>> errmsg("invalid state key \"%s\" for local
>>> sequence",
>>> keys[i])));
>>> }
>>>
>>> sequence_save_tuple(seqh, NULL, true);
>>
>> If that error happens after having already processed a "last_value" or
>> "is_called" entry, you have already modified the on-disk tuple. AFAICS
>> that's the only instance of that bug, but sequence_read_tuple - modify
>> tuple in place - sequence_save_tuple pattern is quite unsafe in general.
>> If you modify a tuple directly in a Buffer, you should have a critical
>> section around it. It would make sense to start a critical section in
>> sequence_read_tuple(), except that not all callers want to modify the
>> tuple in place. Perhaps the sequence_read_tuple/sequence_save_tuple
>> functions should be split into two:
>>
>> sequence_read_tuple_for_update()
>> sequence_save_tuple()
>
> Agreed, however I am much more concerned about the way
> seqam_local_alloc() works, see the XXX comment there. I am thinking it's
> not really safe that way, but problem is that we can't put critical
> section around it currently as the sequence_save_tuple() can potentially
> call GetTopTransactionId(). Separating it into more functions might be
> solution. Except the split into two does not really help there, it would
> have to be something like this:
> sequence_tuple_update_start() - does the GetTopTransactionId bit and
> starts critical section
> sequence_tuple_save() - saves the tuple/does wal
> sequence_tuple_update_finish() - ends the critical section
>
> That looks slightly cumbersome but I don't really have better idea.
>

So I went with something like what was proposed:
- sequence_start_update(seqh, do_wal) - prepares the state and starts
the critical section
- sequence_apply_update(seqh, do_wal) - does buffer dirtying and wal logging
- sequence_finish_update(seqh) - closes the critical section and cleans
up the state

The API might look slightly hairy now but I don't see much difference
between having to do:
START_CRIT_SECTION()
sequence_update_tuple()
END_CRIT_SECTION()
and the above, except the above also solves the xid acquiring and shows
less internals.

I also added sequence_swap_tuple() for the use-case when the AM does not
want to do inline updates because having this separate makes things
easier and less ugly. And once you are changing whole tuple you are
already in slow/complex code path so one more function call does not matter.

>> * seqam_local_get_state() calls sequence_read_tuple() but not
>> sequence_release_tuple(). It looks like sequence_close() releases the
>> tuple and the buffer, but that seems sloppy. sequence_close() probably
>> shouldn't be calling sequence_release_tuple at all, so that you'd get a
>> warning at the end of transaction about the buffer leak.
>>

I did this (requiring the sequence_release_tuple() always) and
documented it, but personally don't like it much as it just adds one
more call to the API which already has enough of them and I really don't
see the advantage of it.

Other than things mentioned above I rebased it on current master (the
transforms commit produced several issues). And fixed several bugs like
checking min/max value in set_state, proper initialization of session
cached increment_by, etc.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-seqam-v9.patch application/x-patch 158.1 KB
0002-seqam-ddl-v5.patch application/x-patch 39.5 KB
0003-gapless-sequence-v5.patch application/x-patch 28.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sawada Masahiko 2015-04-27 16:00:41 Re: Proposal: knowing detail of config files via SQL
Previous Message Andrew Dunstan 2015-04-27 14:59:16 Re: forward vs backward slashes in msvc build code