Re: splitting data into multiple tables

From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: Viji V Nair <viji(at)fedoraproject(dot)org>
Cc: nair rajiv <nair331(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: splitting data into multiple tables
Date: 2010-01-26 11:41:56
Message-ID: alpine.DEB.2.00.1001261130430.6195@aragorn.flymine.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, 25 Jan 2010, Viji V Nair wrote:
> I think this wont help that much if you have a single machine. Partition the
> table and keep the data in different nodes. Have a look at the tools like
> pgpool.II

So partitioning. You have three choices:

1. Use a single table
2. Partition the table on the same server
3. Partition the data across multiple servers.

This is in increasing order of complexity.

There will probably be no problem at all with option 1. The only problem
arises if you run a query that performs a full sequential scan of the
entire table, which would obviously take a while. If your queries are
indexable, then option 1 is almost certainly the best option.

Option 2 adds complexity in the Postgres server. You will need to
partition your tables in a logical manner - that is, there needs to be
some difference between rows in table a compared to rows in table b. This
means that the partitioning will in effect be a little like indexing. You
do not want to have too many partitions. The advantage is that if a query
requires a full sequential scan, then there is the possibility of skipping
some of the partitions, although there is some complexity involved in
getting this to work correctly. In a lot of cases, partitioning will make
queries slower by confusing the planner.

Option 3 is only useful when you have a real performance problem with
long-running queries (partitioning the data across servers) or with very
large numbers of queries (duplicating the data across servers). It also
adds much complexity. It is fairly simple to run a "filter these results
from the table" queries across multiple servers, but if that was all you
were doing, you may as well use an index instead. It becomes impossible to
perform proper cross-referencing queries without some very clever software
(because not all the data is available on the server), which will probably
be hard to manage and slow down the execution anyway.

My recommendation would be to stick with a single table unless you have a
real need to partition.

Matthew

--
Note: some countries impose serious penalties for a conspiracy to overthrow
the political system. THIS DOES NOT FIX THE VULNERABILITY.
-- http://seclists.org/vulnwatch/2003/q2/0002.html

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Wakeling 2010-01-26 11:45:41 Re: splitting data into multiple tables
Previous Message Viji V Nair 2010-01-26 07:58:02 Re: splitting data into multiple tables