Skip site navigation (1) Skip section navigation (2)

Re: [PERFORM] Postgres and really huge tables

From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Brian Hurt" <bhurt(at)janestcapital(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, pgsql-advocacy(at)postgresql(dot)org
Subject: Re: [PERFORM] Postgres and really huge tables
Date: 2007-01-19 14:11:31
Message-ID: b42b73150701190611g6b1f3406ubcaa5678b20209ab@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-advocacypgsql-performance
On 1/18/07, Brian Hurt <bhurt(at)janestcapital(dot)com> wrote:
> Is there any experience with Postgresql and really huge tables?  I'm
> talking about terabytes (plural) here in a single table.  Obviously the
> table will be partitioned, and probably spread among several different
> file systems.  Any other tricks I should know about?

A pretty effective partitioning strategy that works in some cases is
to identify a criteria in your dataset that isolates your data on a
session basis.  For example, if you have a company_id that divides up
your company data and a session only needs to deal with company_id,
you can separate out all your tables based on company_id into
different schemas and have the session set the search_path variable
when it logs in.  Data that does not partition on your criteria sits
in public schemas that all the companies can see.

This takes advantage of a special trick regarding stored procedures
that they do not attach to tables until the first time they are
executed in a session -- keeping you from having to make a function
for each schema. (note: views do not have this property).   You can
still cross query using views and the like or hand rolled sql.

I would call this type of partitioning logical partitioning since you
are leveraging logical divisions in your data.  It obviously doesn't
work in all cases but when it does  it works great.

> We have a problem of that form here.  When I asked why postgres wasn't
> being used, the opinion that postgres would "just <explicitive> die" was
> given.  Personally, I'd bet money postgres could handle the problem (and
> better than the ad-hoc solution we're currently using).  But I'd like a
> couple of replies of the form "yeah, we do that here- no problem" to
> wave around.

pg will of course not die as when your dataset hits a certain
threshold.  It will become slower based on well know mathematical
patterns that grow with your working set size.  One of the few things
that gets to be a pain with large tables is vacuum -- since you can't
vacuum a piece of table and there are certain annoyances with having a
long running vacuum this is something to think about.

Speaking broadly about table partitioning, it optimizes one case at
the expense of another.  Your focus (IMO) should be on reducing your
working set size under certain conditions -- not the physical file
size.  If you have a properly laid out and logical dataset and can
identify special cases where you need some information and not other
information, the partitioning strategy should fall into place, whether
it is to do nothing, isolate data into separate schemas/tables/files,
or use the built in table partitioning feature (which to be honest I
am not crazy about).

merlin

In response to

pgsql-performance by date

Next:From: Bruno Wolff IIIDate: 2007-01-19 16:52:17
Subject: Re: DB benchmark and pg config file help
Previous:From: Kevin HunterDate: 2007-01-19 14:05:35
Subject: Re: DB benchmark and pg config file help

pgsql-advocacy by date

Next:From: Brian HurtDate: 2007-01-19 17:32:05
Subject: Re: Postgres v MySQL 5.0
Previous:From: Devrim GUNDUZDate: 2007-01-19 13:42:57
Subject: Re: PostgreSQL booth, Solutions Linux 2007, Paris

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group