Re: [NOVICE] Partitioning

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Hunter <hunteke(at)earlham(dot)edu>
Cc: PostrgreSQL Performance List <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [NOVICE] Partitioning
Date: 2006-12-27 04:37:39
Message-ID: 11680.1167194259@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice pgsql-performance

Kevin Hunter <hunteke(at)earlham(dot)edu> writes:
> On 26 Dec 2006 at 2:55p -0500, Tom Lane wrote:
>> I thought you did a fine job right there ;-). In essence this would be
>> replacing one level of indexing with two, which is unlikely to be a win.
>> If you have exactly M rows in each of N tables then theoretically your
>> lookup costs would be about O(log(N) + log(M)), which is nominally the
>> same as O(log(M*N)) which is the cost to index into one big table --- so

> Hurm. If I remember my Algorithms/Data Structures course, that implies
> that table lookup is implemented with a B-Tree . . . right?

b-tree or something else with log-N behavior. I think it can be proved
that every search method is at best log-N once N gets large enough, but
of course that might be for N far beyond any practical values.

> Since at SQL preparation time the tables in the query are known, why
> couldn't you use a hash lookup?

The hash index implementation currently available in Postgres hasn't
ever been proven to beat our b-tree implementation, on any dimension.
This might be a matter of how much effort has been thrown at it, or
maybe there's some fundamental problem; but anyway you won't find a
lot of enthusiasm around here for moving the system-catalog indexes
to hashing...

>> I can't see doing it on a per-user basis.

> Perhaps not on a per-user basis, but he could certainly improve access
> times by partitioning even coursely. I'll point him in that direction
> Are there other, perhaps better ways to improve the access times?

Actually, I forgot to mention an interesting talk I heard last month,
wherein Casey Duncan explained how pandora.com scales across an
unreasonable number of users whose queries are mostly-but-not-always
localized. Essentially they partition their tables on the basis of
a hash of the user ID, where there are as many hash result values
as they have servers. The point still remains though that there
are far fewer partitions than users. I think what you want to take
away is that you choose the number of partitions based on physical
implementation numbers ("how many filesystems do we have today")
and not on logical numbers like "how many users do we have today".

regards, tom lane

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Richard Broersma Jr 2006-12-27 18:34:01 pg_dump using MS-DOS BAT file
Previous Message Kevin Hunter 2006-12-26 22:29:58 Re: [NOVICE] Partitioning

Browse pgsql-performance by date

  From Date Subject
Next Message JM 2006-12-28 02:42:52 Need Help
Previous Message Kevin Hunter 2006-12-26 22:29:58 Re: [NOVICE] Partitioning