Re: UUIDs & Clustered Indexes

From: "Mike Sofen" <msofen(at)runbox(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: UUIDs & Clustered Indexes
Date: 2016-08-30 16:40:33
Message-ID: 02de01d202dd$3b7d2190$b27764b0$@runbox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

From: Tom Lane Sent: Tuesday, August 30, 2016 7:16 AM

Luke Gordon < <mailto:gordysc(at)gmail(dot)com> gordysc(at)gmail(dot)com> writes:

> However, according to a message on this mailing list, Postgres doesn't

> have clustered indexes:

> "But Postgres doesn't _have_ clustered indexes, so that article

> doesn't apply at all. The other authors appear to have missed this
important point."

> <https://www.postgresql.org/message-id/56798352.7060902%40uchicago.edu>
https://www.postgresql.org/message-id/56798352.7060902%40uchicago.edu

> But, doing a quick check, it appears Postgres does indeed have a

> mechanism for a clustered index:

> <https://www.postgresql.org/docs/9.5/static/sql-cluster.html>
https://www.postgresql.org/docs/9.5/static/sql-cluster.html

CLUSTER just does a one-time sort to put the table into index order.

There is no mechanism that would cause subsequent insertions of new keys to
respect that ordering, so it's pretty much irrelevant to the argument about
whether new UUID keys need to be generated in some ordered fashion.

Do you actually *need* UUID keys, and if so why? A plain old bigint column
is smaller, cheaper to index, and the natural mechanism for generating it
(ie a sequence) will tend to preserve ordering for free.

regards, tom lane

I agree with Tom for a "normal" application - I would always use bigints
(bigserial) as a PK column. The app I currently working on is a high
security web app for which the app coders require guids for all identifiers
flowing around the system. So in this scenario, I'm using BOTH bigserials
as the PK and uuids as AKs in the core tables. I reference the bigints for
all joins and (have to) use the uuids for the filters. It's been working ok
so far, lookup performance on a table with a few million rows, using the
uuid (indexed) is instantaneous. I'll soon have a 100 million+ rows loaded
into a single table and know a bit more.

The uuids are also design insurance for me in case I need to shard, since
I'll need/want that uniqueness across servers.

Mike Sofen

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vick Khera 2016-08-30 17:26:33 Re: Clustered index to preserve data locality in a multitenant application?
Previous Message Francisco Olarte 2016-08-30 14:58:45 Re: UUIDs & Clustered Indexes