Re: UUID v7

From: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>, Sergey Prokhorenko <sergeyprokhorenko(at)yahoo(dot)com(dot)au>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Przemysław Sztoch <przemyslaw(at)sztoch(dot)pl>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Mat Arye <mat(at)timescaledb(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>
Subject: Re: UUID v7
Date: 2024-03-26 17:26:14
Message-ID: 234823B2-D723-4302-8D56-BFE2A62221E8@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry for this long reply. I was looking on refactoring around pg_strong_random() and could not decide what to do. Finally, I decided to post at least something.

> On 22 Mar 2024, at 19:15, Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>
> I have been studying the uuidv() function.
>
> I find this code extremely hard to follow.
>
> We don't need to copy all that documentation from the RFC 4122bis document. People can read that themselves. What I would like to see is easy to find information what from there we are implementing. Like,
>
> - UUID version 7
> - fixed-length dedicated counter
> - counter is 18 bits
> - 4 bits are initialized as zero

I've removed table taken from RFC.

> That's more or less all I would need to know what is going on.
>
> That said, I don't understand why you say it's an 18 bit counter, when you overwrite 6 bits with variant and version. Then it's just a 12 bit counter? Which is the size of the rand_a field, so that kind of makes sense. But 12 bits is the recommended minimum, and (in this patch) we don't use sub-millisecond timestamp precision, so we should probably use more than the minimum?

No, we use 4 bits in data[6], 8 bits in data[7], and 6 bits data[8]. It's 18 total. Essentially, we use both partial bytes and one whole byte between.
There was a bug - we used 1 extra byte of random numbers that was not necessary, I think that's what lead you to think that we use 12-bit counter.

> Also, you are initializing 4 bits (I think?) to zero to guard against counter rollovers (so it's really just an 8 bit counter?). But nothing checks against such rollovers, so I don't understand the use of that.

No, there's only one guard rollover bit.
Here: uuid->data[6] = (uuid->data[6] & 0xf7);
Bits that are called "guard bits" do not guard anything, they just ensure counter capacity when it is initialized.
Rollover is carried into time tick here:
++sequence_counter;
if (sequence_counter > 0x3ffff)
{
/* We only have 18-bit counter */
sequence_counter = 0;
previous_timestamp++;
}

I think we might use 10 bits of microseconds and have 8 bits of a counter. Effect of a counter won't change much. But I'm not sure if this is allowed per RFC.
If time source is coarse-grained it still acts like a random initializer. And when it is precise - time is "natural" source of entropy.

> The code code be organized better. In the not-increment_counter case, you could use two separate pg_strong_random calls: One to initialize rand_b, starting at &uuid->data[8], and one to initialize the counter. Then the former could be shared between the two branches, and the code to assign the sequence_counter to the uuid fields could also be shared.

Call to pg_strong_random() is very expensive in builds without ssl (and even with ssl too). If we could ammortize random numbers in small buffers - that would save a lot of time (see v8-0002-Buffer-random-numbers.patch upthread). Or, perhaps, we can ignore cost of two pg_string_random() calls.

>
> I would also prefer if the normal case (not-increment_counter) were the first if branch.

Done.

> Some other notes on your patch:
>
> - Your rebase duplicated the documentation of uuid_extract_timestamp and uuid_extract_version.
>
> - PostgreSQL code uses uint64 etc. instead of uint64_t etc.
>
> - It seems the added includes
>
> #include "access/xlog.h"
> #include "utils/builtins.h"
> #include "utils/datetime.h"
>
> are not needed.
>
> - The static variables sequence_counter and previous_timestamp could be kept inside the uuidv7() function.

Fixed.

Thanks!

Best regards, Andrey Borodin.

Attachment Content-Type Size
v24-0001-Implement-UUID-v7.patch application/octet-stream 14.5 KB

In response to

  • Re: UUID v7 at 2024-03-22 14:15:05 from Peter Eisentraut

Responses

  • Re: UUID v7 at 2024-04-04 13:45:21 from Peter Eisentraut

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-03-26 17:52:12 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Tom Lane 2024-03-26 17:23:56 Re: Possibility to disable `ALTER SYSTEM`