Re: Review: GiST support for UUIDs

From: Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Paul Jungwirth <pj(at)illuminatedcomputing(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: GiST support for UUIDs
Date: 2015-12-23 16:10:53
Message-ID: 20151223191053.00ab94e1@lp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 15 Sep 2015 09:03:00 +0300
Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:

> Paul Jungwirth wrote:
> >> Or something like this in pseudocode:
> >>
> >> numeric = int8_numeric(*(uint64 *)(&i->data[0])) *
> >> int8_numeric(MAX_INT64) + int8_numeric(*(uint64 *)(&i->data[8]))
> >
> > This is more like what I was hoping for, rather than converting to a
> > string and back. Would you mind confirming for me: int8_numeric
> > turns an int64 into an arbitrary-precision numeric Datum? So there
> > is no overflow risk here?
> Sure, no risk. Numeric precision is limited 1000 digits with
> magnitude 1000
>
>
> >
> > But it looks like int8_numeric takes a *signed* integer. Isn't that
> > a problem? I suppose I could get it working though by jumping
> > through some hoops.
> signed vs unsigned problem does not exist actually, because of
> precision of numeric is much better than we need and presence of
> numeric_abs.
>
>
> > Yes, but that seems like an unrealistic concern. Even "only" 2^64 is
> > 18446744073709551616 records. Or another way of thinking about it,
> > if 64
> :) "only"
>
> > bits (or 32) is a good enough penalty input for all the other data
> > types, why not for UUIDs? Keep in mind type 3-5 UUIDs should be
> > evenly distributed. Perhaps we could use the bottom half (instead
> > of the top) to ensure even distribution for type 1 and 2 too.
> it must be. But UUID could be taken for unknown source and we can't
> predict distribution. I believe pg generates them correctly, but
> other generators could be not so good.
>
> > It seems to me that using only the top half should be okay, but if
> > you believe it's not I'll go with the int8_numeric approach (in
> > three chunks instead of two to work around signed-vs-unsigned).
> Yes, I believe. It is not good case when we can ruin index
> performance with special set of value.
>
> Some difficulty which I see is how to transform numeric penalty to
> double as it requires by GiST. May be, penalty/(INT_MAX64*INT_MAX64 +
> INT_MAX64)? Keep in mind, that penalty is how range will be enlarged
> after range union, so minimal penalty is 0 and maximum is
> 0xffffffffffffffffffffffffffffffff
>

There is a more improved version of the patch. Main idea is to present
uuid as two uint64 values, and make comparisons and penalty calculation
based on these values. This approach is much faster than using memcmp
for uuid comparisons.

--
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
btree_gist_uuid_3.patch text/x-patch 122.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2015-12-23 16:23:10 Re: pgbench --latency-limit option
Previous Message Tom Lane 2015-12-23 15:46:22 Re: [PROPOSAL] Backup and recovery of pg_statistic