Re: partitioning for speed, but query planner ignores

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: David Rysdam <drysdam(at)ll(dot)mit(dot)edu>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: partitioning for speed, but query planner ignores
Date: 2013-10-02 15:19:58
Message-ID: 1380727198.99522.YahooMailNeo@web162901.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

David Rysdam <drysdam(at)ll(dot)mit(dot)edu> wrote:

> We have a by-our-standards large table (about 40e6 rows). Since it is
> the bottleneck in some places, I thought I'd experiment with
> partitioning.

In my personal experience I have gone into hundreds of millions of
rows with good performance without partitioning.  It's all about
designing good indexes for the workload.

I have only seen partitioning help in two cases:
(1)  There will be bulk deletes of rows, and you know at insert
time which bulk delete the row belongs with.  Dropping a partition
table is a very fast way to delete a large number of rows.
(2)  The bulk of activity will be on a relatively small subset of
the rows at any one time, and you can partition such that the set
of active rows will be in a small number of partitions.

In all other cases, I have only seen partitioning harm performance.
 There is no reason to think that checking the table-level
constraints on every partition table will be faster than descending
through an index tree level.

> The table holds data about certain objects, each of which has an object
> number and some number of historical entries (like account activity at a
> bank, say). The typical usage pattern is: relatively rare inserts that
> happen in the background via an automated process (meaning I don't care
> if they take a little longer) and frequent querying, including some
> where a human is sitting in front of it (i.e. I'd like it to be a lot
> faster).
>
> Our most frequent queries either select "all history for object N" or
> "most recent item for some subset of objects".
>
> Because object number figure so prominently, I thought I'd partition on
> that. To me, it makes the most sense from a load-balancing perspective

Load balancing?  Hitting a single partition more heavily improves
your cache hit ratio.  What sort of benefit are you expecting from
spreading the reads across all the partitions?  *Maybe* that could
help if you carefully placed each partition table on a separate set
of spindles, but usually you are better off having one big RAID so
that every partition is spread across all the spindles
automatically.

> Lower numbers are going to be queried much less often than higher
> numbers.

This suggests to me that you *might* get a performance boost if you
define partitions on object number *ranges*.  It still seems a bit
dubious, but it has a chance.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marc Fournier 2013-10-02 15:42:50 Re: [HACKERS] Who is pgFoundery administrator?
Previous Message Adrian Klaver 2013-10-02 14:39:15 Re: Timestamp with and without timezone conversion confusion.