Re: Partitioning

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Hunter <hunteke(at)earlham(dot)edu>
Cc: PostrgreSQL Novice List <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Partitioning
Date: 2006-12-26 19:55:37
Message-ID: 28034.1167162937@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice pgsql-performance

Kevin Hunter <hunteke(at)earlham(dot)edu> writes:
> A friend has asked me about creating a unique table for individual users
> that sign up for his site. (In essence, each user who signs up would
> essentially get a set of CREATE TABLE {users,friends,movies,eats}_<id> (
> ... ); statements executed, the idea being to reduce the number of rows
> that the DB needs to search or index/update in regards to a particular
> user id.) The just seems ludicrous to me, because the database still
> needs to find those tables from its internal structure, not to mention
> that it just seems silly to me from a design perspective. Something
> about unable to optimize any queries because not only is the WHERE
> clause in flux, but so is the FROM clause.

> Question: Could someone explain to me why this would be bad idea,
> because I can't put into words why it is.

I thought you did a fine job right there ;-). In essence this would be
replacing one level of indexing with two, which is unlikely to be a win.
If you have exactly M rows in each of N tables then theoretically your
lookup costs would be about O(log(N) + log(M)), which is nominally the
same as O(log(M*N)) which is the cost to index into one big table --- so
at best you break even, and that's ignoring the fact that index search
has a nonzero startup cost that'll be paid twice in the first case.
But the real problem is that if the N tables contain different numbers
of rows then you have an unevenly filled search tree, which is a net
loss.

Most DBMSes aren't really designed to scale to many thousands of tables
anyway. In Postgres this would result in many thousands of files in
the same database directory, which probably creates some filesystem
lookup inefficiencies in addition to whatever might be inherent to
Postgres.

Partitioning is indeed something that is commonly done, but on a very
coarse grain --- you might have a dozen or two active partitions, not
thousands. The point of partitioning is either to spread a huge table
across multiple filesystems (and how many filesystems have you got?)
or else to make predictable removals of segments of the data cheap (for
instance, dropping the oldest month's worth of data once a month, in a
table where you only keep the last year or so's worth of data on-line).
I can't see doing it on a per-user basis.

regards, tom lane

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Kevin Hunter 2006-12-26 22:29:58 Re: [NOVICE] Partitioning
Previous Message Kevin Hunter 2006-12-26 18:36:43 Partitioning

Browse pgsql-performance by date

  From Date Subject
Next Message Bob Dusek 2006-12-26 20:37:47 performance implications of binary placement
Previous Message Kevin Hunter 2006-12-26 18:36:43 Partitioning