Re: Sequence Access Method WIP

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sequence Access Method WIP
Date: 2014-11-05 17:32:27
Message-ID: 545A5F2B.2010503@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/05/2014 05:07 PM, Petr Jelinek wrote:
> On 05/11/14 13:45, Heikki Linnakangas wrote:
>> In fact, if the seqam manages the current value outside the database
>> (e.g. a "remote" seqam that gets the value from another server),
>> nextval() never needs to write a WAL record.
>
> Sure it does, you need to keep the current state in Postgres also, at
> least the current value so that you can pass correct input to
> sequence_alloc(). And you need to do this in crash-safe way so WAL is
> necessary.

Why does sequence_alloc need the current value? If it's a "remote"
seqam, the current value is kept in the remote server, and the last
value that was given to this PostgreSQL server is irrelevant.

That irks me with this API. The method for acquiring a new value isn't
fully abstracted behind the AM interface, as sequence.c still needs to
track it itself. That's useful for the local AM, of course, and maybe
some others, but for others it's totally useless.

>>>>> For the amdata handling (which is the AM's private data variable) the
>>>>> API assumes that (Datum) 0 is NULL, this seems to work well for
>>>>> reloptions so should work here also and it simplifies things a little
>>>>> compared to passing pointers to pointers around and making sure
>>>>> everything is allocated, etc.
>>>>>
>>>>> Sadly the fact that amdata is not fixed size and can be NULL made the
>>>>> page updates of the sequence relation quite more complex that it
>>>>> used to
>>>>> be.
>>>>
>>>> It would be nice if the seqam could define exactly the columns it needs,
>>>> with any datatypes. There would be a set of common attributes:
>>>> sequence_name, start_value, cache_value, increment_by, max_value,
>>>> min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
>>>> and "is_called" to that. A remote seqam that calls out to some other
>>>> server might store the remote server's hostname etc.
>>>>
>>>> There could be a seqam function that returns a TupleDesc with the
>>>> required columns, for example.
>>>
>>> Wouldn't that somewhat bloat catalog if we had new catalog table for
>>> each sequence AM?
>>
>> No, that's not what I meant. The number of catalog tables would be the
>> same as today. Sequences look much like any other relation, with entries
>> in pg_attribute catalog table for all the attributes for each sequence.
>> Currently, all sequences have the same set of attributes, sequence_name,
>> last_value and so forth. What I'm proposing is that there would a set of
>> attributes that are common to all sequences, but in addition to that
>> there could be any number of AM-specific attributes.
>
> Oh, that's interesting idea, so the AM interfaces would basically return
> updated tuple and there would be some description function that returns
> tupledesc.

Yeah, something like that.

> I am bit worried that this would kill any possibility of
> ALTER SEQUENCE USING access_method. Plus I don't think it actually
> solves any real problem - serializing the internal C structs into bytea
> is not any harder than serializing them into tuple IMHO.

I agree that serialization to bytea isn't that difficult, but it's still
nicer to work directly with the correct data types. And it makes the
internal state easily accessible for monitoring and debugging purposes.

>>> It also does not really solve the amdata being dynamic
>>> size "issue".
>>
>> Yes it would. There would not be a single amdata attribute, but the AM
>> could specify any number of custom attributes, which could be fixed size
>> or varlen. It would be solely the AM's responsibility to set the values
>> of those attributes.
>>
>
> That's not the issue I was referring to, I was talking about the page
> replacement code which is not as simple now that we have potentially
> dynamic size tuple and if tuples were different for different AMs the
> code would still have to be able to handle that case. Setting the values
> in tuple itself is not too complicated.

I don't see the problem with that. We deal with variable-sized tuples in
heap pages all the time. The max size of amdata (or the extra
AM-specific columns) is going to be determined by the block size, though.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2014-11-05 17:40:59 intel s3500 -- hot stuff
Previous Message Jeff Janes 2014-11-05 17:14:20 Re: BRIN indexes - TRAP: BadArgument