Re: UUID v7

From: Andrey M(dot) Borodin <x4mmm(at)yandex-team(dot)ru>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Sergey Prokhorenko <sergeyprokhorenko(at)yahoo(dot)com(dot)au>, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>, Michael Paquier <michael(at)paquier(dot)xyz>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Przemysław Sztoch <przemyslaw(at)sztoch(dot)pl>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Mat Arye <mat(at)timescaledb(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>
Subject: Re: UUID v7
Date: 2024-10-26 16:07:10
Message-ID: 714ED617-CC7C-4815-8039-D378CCEBD48E@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the review!

> On 18 Oct 2024, at 02:16, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sun, Aug 4, 2024 at 3:51 AM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>>
>>
>>
>>> On 28 Jul 2024, at 23:44, Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>>>
>>> PFA version accepting offset interval.
>>
>> There was a bug: when time was not moving on, I was updating used time by a nanosecond, instead of 1/4096 of millisecond.
>> V27 fixes that.
>>
>> Thanks!
>
> I've reviewed the v27 patch and have some comments:
>
> ---
> in datatype.sgml:
>
> The data type <type>uuid</type> stores Universally Unique Identifiers
> (UUID) as defined by <ulink
> url="https://datatracker.ietf.org/doc/html/rfc4122">RFC 4122</ulink>,
> ISO/IEC 9834-8:2005, and related standards.
>
> In funcs.sgml:
> This function extracts the version from a UUID of the variant described by
> <ulink url="https://datatracker.ietf.org/doc/html/rfc4122">RFC
> 4122</ulink>. For
>
> Maybe these references of RFC4122 need to be updated as well.

Fixed.

> ---
> 'git show --check' raises a warning:

Fixed.

>
> src/backend/utils/adt/uuid.c:520: trailing whitespace.
> +
>
> ---
> +
> + if (PG_NARGS() > 0)
> + {
> + Interval *span;
> + TimestampTz ts = (TimestampTz) (ns / 1000) -
> + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * SECS_PER_DAY *
> USECS_PER_SEC;
> + span = PG_GETARG_INTERVAL_P(0);
> + ts = DatumGetTimestampTz(DirectFunctionCall2(timestamptz_pl_interval,
> + TimestampTzGetDatum(ts),
> + IntervalPGetDatum(span)));
> + ns = (ts + (POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) *
> SECS_PER_DAY * USECS_PER_SEC)
> + * 1000 + ns % 1000;
> + }
>
> We need to add a comment to describe what/why we're doing here.

Done.

>
> ---
> + * Monotonicity (regarding generation on given backend) is ensured with method
> + * "Replace Leftmost Random Bits with Increased Clock Precision (Method 3)"
>
> Need a period at the end of this sentence.

Fixed.

>
> ---
> +{ oid => '9896', descr => 'generate UUID version 7',
> + proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
> + prorettype => 'uuid', proargtypes => '', prosrc => 'uuidv7' },
> +{ oid => '9897', descr => 'generate UUID version 7',
> + proname => 'uuidv7', proleakproof => 't', provolatile => 'v',
> + prorettype => 'uuid', proargtypes => 'interval', prosrc => 'uuidv7' },
>
> Both functions have the same description but work differently. I think
> it's better to clarify the description of uuidv7() that takes an
> interval.

I've slightly extended the description... not it's 'generate UUID version 7 with a timestamp shifted on specific interval'. Perhaps, we can come up with something better.

>
> ---
> - oid | proname | oid | proname
> ------+---------+-----+---------
> -(0 rows)
> + oid | proname | oid | proname
> +------+---------+------+---------
> + 9896 | uuidv7 | 9897 | uuidv7
> +(1 row)
>
> I think that we need to change these functions so that this check
> query doesn't return anything, no?

We have 4 options:
0. Remove uuidv7(interval). But it brings imporatne functionality to the table: we can avoid contention points while massively insert data.
1. Give different names to uuidv7() and uuidv7(interval).
2. Allow importing pg_node_tree (see v7 of the patch)
3. Change this query. Comment to this query suggest that it checks for exactly this case: same function is declared with different number of arguments.

IMO approach number 3 is best. However, I do not understand why this query check was introduced in the first place. Maybe, there are string arguments why we should not do same-named functions with different number of arguments.

>
> ---
> + if (version == 6)
> + {
> + tms = ((uint64) uuid->data[0]) << 52;
> + tms += ((uint64) uuid->data[1]) << 44;
> + tms += ((uint64) uuid->data[2]) << 36;
> + tms += ((uint64) uuid->data[3]) << 28;
> + tms += ((uint64) uuid->data[4]) << 20;
> + tms += ((uint64) uuid->data[5]) << 12;
> + tms += (((uint64) uuid->data[6]) & 0xf) << 8;
> + tms += ((uint64) uuid->data[7]);
> +
> + /* convert 100-ns intervals to us, then adjust */
> + ts = (TimestampTz) (tms / 10) -
> + ((uint64) POSTGRES_EPOCH_JDATE - GREGORIAN_EPOCH_JDATE) *
> SECS_PER_DAY * USECS_PER_SEC;
> +
> + PG_RETURN_TIMESTAMPTZ(ts);
> + }
>
> It's odd to me that only uuid_extract_timestamp() supports UUID v6 in
> spite of not supporting UUID v6 generation. I think it makes more
> sense to support UUID v6 generation as well, if the need for it is
> high.

RFC urges to use UUIDv7 instead of UUIDv6 when possible. I'm fine with providing implementation, it's trivial. PFA patch with implementation.

Best regards, Andrey Borodin.

Attachment Content-Type Size
v28-0001-Implement-UUID-v7.patch application/octet-stream 20.8 KB

In response to

  • Re: UUID v7 at 2024-10-17 23:16:11 from Masahiko Sawada

Responses

  • Re: UUID v7 at 2024-10-31 18:15:36 from Masahiko Sawada

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-10-26 16:14:38 Re: Assertion failure when autovacuum drops orphan temp indexes.
Previous Message Nathan Bossart 2024-10-26 15:55:44 Re: sunsetting md5 password support