Re: uuid-ossp: Performance considerations for different UUID approaches?

From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Brendan McCollam <bmccollam(at)uchicago(dot)edu>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: uuid-ossp: Performance considerations for different UUID approaches?
Date: 2015-12-22 17:40:18
Message-ID: 20151222124018.bee10b60b3d9b58d7b3a1839@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, 22 Dec 2015 11:07:30 -0600
Brendan McCollam <bmccollam(at)uchicago(dot)edu> wrote:

> (previously posted to the pgsql-performance list)
>
> Hello,
>
> We're in the process of designing the database for a new service, and
> some of our tables are going to be using UUID primary key columns.
>
> We're trying to decide between:
>
> * UUIDv1 (timestamp/MAC uuid) and
>
> * UUIDv4 (random uuid)
>
> And the separate but related choice between:
>
> * Generating the UUIDs client-side with the Python uuid library
> (https://docs.python.org/2/library/uuid.html) or
>
> * Letting PostgreSQL handle uuid creation with the uuid-ossp extension
> (http://www.postgresql.org/docs/9.4/static/uuid-ossp.html)
>
> In terms of insert and indexing/retrieval performance, is there one
> clearly superior approach? If not, could somebody speak to the
> performance tradeoffs of different approaches?
>
> There seem to be sources online (e.g.
> https://blog.starkandwayne.com/2015/05/23/uuid-primary-keys-in-postgresql/
> http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/)
> that claim that UUIDv4 (random) will lead to damaging keyspace
> fragmentation and using UUIDv1 will avoid this.

There's no substance to these claims. Chasing the links around we finally
find this article:
http://www.sqlskills.com/blogs/kimberly/guids-as-primary-keys-andor-the-clustering-key/
which makes the reasonable argument that random primary keys can cause
performance robbing fragmentation on clustered indexes.
But Postgres doesn't _have_ clustered indexes, so that article doesn't
apply at all. The other authors appear to have missed this important
point.

One could make the argument that the index itself becomming fragmented
could cause some performance degredation, but I've yet to see any
convincing evidence that index fragmentation produces any measurable
performance issues (my own experiments have been inconclusive).

Looking at it another way, a quick experiment shows that PG can fit
about 180 UUID primary keys per database page, which means a million
row table will use about 5600 pages to the tune of about 46m. On
modern hardware, that index is likely to be wholly in memory all the
time.

If your performance requirements are really so dire, then you should
probably consider ditching UUIDs as keys. Taking the same million row
table I postulated in the previous paragraph, but using ints insted
of UUIDs for the primary key, the primary key index would be about
3200 pages (~26m) ... or almost 1/2 the size -- making it more likely
to all be in memory at any point in time.

I seriously doubt that trying to make your UUIDs generate in a
predictable fashon will produce any measurable improvement, and I
see no evidence in the articles you cited that claims otherwise
have any real basis or were made by anyone knowledgeable enough
to know.

--
Bill Moran

In response to

Browse pgsql-general by date

  From Date Subject
Next Message oleg yusim 2015-12-22 18:49:55 Re: Session Identifiers
Previous Message Aleksander Łukasz 2015-12-22 17:24:17 Re: Table with seemingly duplicated primary key values