Re: UUID v7

From: "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
To: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Sergey Prokhorenko <sergeyprokhorenko(at)yahoo(dot)com(dot)au>, Przemysław Sztoch <przemyslaw(at)sztoch(dot)pl>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Mat Arye <mat(at)timescaledb(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>
Subject: Re: UUID v7
Date: 2024-03-11 18:27:43
Message-ID: 844E6A65-4A36-4A36-94C2-0CDF7615B53F@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 11 Mar 2024, at 20:56, Jelte Fennema-Nio <postgres(at)jeltef(dot)nl> wrote:
>
> Attached a few comment fixes/improvements and a pgindent run (patch 0002-0004)

Thanks!

> Now with the added comments, one thing pops out to me: The comments
> mention that we use "Monotonic Random", but when I read the spec that
> explicitly recommends against using an increment of 1 when using
> monotonic random. I feel like if we use an increment of 1, we're
> better off going for the "Fixed-Length Dedicated Counter Bits" method
> (i.e. change the code to start the counter at 0). See patch 0005 for
> an example of that change.
>
> I'm also wondering if we really want to use the extra rand_b bits for
> this. The spec says we MAY, but it does remove the amount of
> randomness in our UUIDs.

Method 1 is a just a Method 2 with specifically picked constants.
But I'll have to use some hand-wavy wordings...

UUID consists of these 128 bits:
a. Mandatory 2 var and 4 ver bits.
b. Flexible but strongly recommended 48 bits unix_ts_ms. These bits contribute to global sortability of values generated at frequency less than 1KHz.
c. Counter bits:
c1. Initialised with 0 on any time tick.
c2. Initialised with randomness.
c3*. bit width of a counter step (*not counted in 128 bit capacity, can be non-integral)
d. Randomness bits.

Method 1 is when c2=0. My implementation of method 2 uses c1=1, c2=17

Consider all UUIDs generated at any given milliseconds. Probability of a collision of two UUIDs generated at frequency less than 1KHz is p = 2^-(c2+d)
Capacity of a counter has expected value of c = 2^(c1)*2^(c2-1)/2^c3
To guess next UUID you can correctly pick one of u = 2^(d+c3)

First, observe that c3 contributes unguessability at exactly same scale as decreases counter capacity. There is no difference between using bits in d directly, or in c3. There is no point in non-zero c3. Every bit that could be given to c3 can equally be given to d.

Second, observe that c2 bits contribute to both collision protection and counter capacity! And when the time ticks, c2 also contribute to unguessability! So, technically, we should consider using all available bits as c2 bits.

How many c1 bits do we need? I've chosen one - to prevent occasional counter capacity reduction.

If c1 = 1, we can distribute 73 bits between c2 and d. I've chosen c2 = 17 and d = 56 as an arbitrary compromise between capacity of one backend per ms and prevention of global collision.
This compromise is mostly dictated by maximum frequency of UUID generation by one backend, I've chosen 200MHz as a sane value.

This compromise is much easier when you do not have 74 spare bits, this crazy amount of information forgives almost any mistake. Imagine you have to distribute 10 bits between c2 and d. And you try to prevent collision between 10 independent devices which need capacity to generate IDs with frequency of 10KHz each and keep sortability. You would have something like c1=1, c2=3,d=6.

Sorry for this long and vague explanation, if it still seems too uncertain we can have a chat or something like that. I don't think this number picking stuff deserve to be commented, because it still is quite close to random. RFC gives us too much freedom of choice.

Thanks!

Best regards, Andrey Borodin.

In response to

  • Re: UUID v7 at 2024-03-11 15:56:23 from Jelte Fennema-Nio

Responses

  • Re: UUID v7 at 2024-03-12 05:53:27 from Michael Paquier
  • Re: UUID v7 at 2024-03-12 15:35:59 from Jelte Fennema-Nio

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2024-03-11 18:35:02 Re: btree: downlink right separator/HIKEY optimization
Previous Message Corey Huinker 2024-03-11 18:20:36 Re: Statistics Import and Export