Re: Thousands of tables versus on table?

From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: david(at)lang(dot)hm, Gregory Stark <stark(at)enterprisedb(dot)com>, Thomas Andrews <tandrews(at)soliantconsulting(dot)com>, Mark Lewis <mark(dot)lewis(at)mir3(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Thousands of tables versus on table?
Date: 2007-06-06 15:17:29
Message-ID: 4666D009.9030505@g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> The degree to which this is a win is *highly* debatable, and certainly
> depends on a whole lot of assumptions about filesystem performance.
> You also need to assume that constraint-exclusion in the planner is
> pretty doggone cheap relative to the table searches, which means it
> almost certainly will lose badly if you carry the subdivision out to
> the extent that the individual tables become small. (This last could
> be improved in some cases if we had a more explicit representation of
> partitioning, but it'll never be as cheap as one more level of index
> search.)
I did some testing a while back on some of this, and with 400 or so
partitions, the select time was still very fast.

We were testing grabbing 50-80k rows from 12M at a time, all adjacent to
each other. With the one big table and one big two way index method, we
were getting linearly increasing select times as the dataset grew larger
and larger. The indexes were much larger than available memory and
shared buffers. The retrieval time for 50-80k rows was on the order of
2 to 6 seconds, while the retrieval time for the same number of rows
with 400 partitions was about 0.2 to 0.5 seconds.

I haven't tested with more partitions than that, but might if I get a
chance. What was really slow was the inserts since I was using rules at
the time. I'd like to try re-writing it to use triggers, since I would
then have one trigger on the parent table instead of 400 rules. Or I
could imbed the rules into the app that was creating / inserting the
data. The insert performance dropped off VERY fast as I went over 100
rules, and that was what primarily stopped me from testing larger
numbers of partitions.

The select performance stayed very fast with more partitions, so I'm
guessing that the constraint exclusion is pretty well optimized.

I'll play with it some more when I get a chance. For certain operations
like the one we were testing, partitioning seems to pay off big time.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2007-06-06 16:23:53 Re: Thousands of tables versus on table?
Previous Message Scott Marlowe 2007-06-06 15:07:19 Re: Thousands of tables versus on table?