Re: UUID performance as primary key

From: Andy <angelflow(at)yahoo(dot)com>
To: Navkirat Singh <navkirats(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: UUID performance as primary key
Date: 2010-10-16 03:46:49
Message-ID: 321028.24609.qm@web111312.mail.gq1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Wouldn't UUID PK cause a significant drop in insert performance because every insert is now out of order, which leads to a constant re-arranging of the B+ tree? The amount of random IO's that's going to generate would just kill the performance.

--- On Fri, 10/15/10, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Subject: Re: [PERFORM] UUID performance as primary key
To: "Navkirat Singh" <navkirats(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Date: Friday, October 15, 2010, 10:59 PM

On 16/10/2010 9:58 AM, Navkirat Singh wrote:
> Hi Guys,
>
> I am interested in finding out the pros/cons of using UUID as a primary key field. My requirement states that UUID would be perfect in my case as I will be having many small databases which will link up to a global database using the UUID. Hence, the need for a unique key across all databases. It would be extremely helpful if someone could help me figure this out, as it is critical for my project.

Pro: No need for (serverid,serverseq) pair primary keys or hacks with modulus based key generation. Doesn't set any pre-determined limit on how many servers/databases may be in a cluster.

Con: Slower than modulo key generation approach, uses more storage. Foreign key relationships may be slower too.

Overall, UUIDs seem to be a favoured approach. The other way people seem to do this is by assigning a unique instance id to each server/database out of a maximum "n" instances decided at setup time. Every key generation sequence increments by "n" whenever it generates a key, with an offset of the server/database id. That way, if n=100, server 1 will generate primary keys 001, 101, 201, 301, ..., server 2 will generate primary keys 002, 102, 202, 302, ... and so on.

That works great until you need more than 100 instances, at which point you're really, REALLY boned. In really busy systems it also limits the total amount of primary key space - but with BIGINT primary keys, that's unlikely to be something you need to worry about.

The composite primary key (serverid,sequenceid) approach avoids the need for a pre-defined maximum number of servers, but can be slow to index and can require more storage, especially because of tuple headers.

I have no firsthand experience with any of these approaches so I can't offer you a considered opinion. I know that the MS-SQL crowd at least strongly prefer UUIDs, but they have very strong in-database UUID support. MySQL folks seem to mostly favour the modulo primary key generation approach. I don't see much discussion of the issue here - I get the impression Pg doesn't see heavy use in sharded environments.

-- Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

-- Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Alvaro Herrera 2010-10-16 05:03:02 Re: No hash join across partitioned tables?
Previous Message Greg Smith 2010-10-16 03:28:29 Re: Slow count(*) again...