Large rows number, and large objects

From: Jose Ildefonso Camargo Tolosa <ildefonso(dot)camargo(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Large rows number, and large objects
Date: 2011-06-19 04:06:02
Message-ID: BANLkTikXvotWjfWBra=6+66WZKfHkie4AQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Greetings,

I have been thinking a lot about pgsql performance when it is dealing
with tables with lots of rows on one table (several millions, maybe
thousands of millions). Say, the Large Object use case:

one table has large objects (have a pointer to one object).
The large object table stores the large object in 2000 bytes chunks
(iirc), so, if we have something like 1TB of data stored in large
objects, the large objects table would have something like 550M rows,
if we get to 8TB, we will have 4400M rows (or so).

I have read at several places that huge tables should be partitioned,
to improve performance.... now, my first question comes: does the
large objects system automatically partitions itself? if no: will
Large Objects system performance degrade as we add more data? (I guess
it would).

Now... I can't fully understand this: why does the performance
actually goes lower? I mean, when we do partitioning, we take a
certain parameter to "divide" the data,and then use the same parameter
to issue the request against the correct table... shouldn't the DB
actually do something similar with the indexes? I mean, I have always
thought about the indexes, well, exactly like that: approximation
search, I know I'm looking for, say, a date that is less than
2010-03-02, and the system should just position itself on the index
around that date, and scan from that point backward... as far as my
understanding goes, the partitioning only adds like this "auxiliary"
index, making the system, for example, go to a certain table if the
query goes toward one particular year (assuming we partitioned by
year), what if the DB actually implemented something like an Index for
the Index (so that the first search on huge tables scan on an smaller
index that points to a position on the larger index, thus avoiding the
scan of the large index initially).

Well.... I'm writing all of this half-sleep now, so... I'll re-read it
tomorrow... in the meantime, just ignore anything that doesn't make a
lot of sense :) .

Thanks!

Ildefonso Camargo

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Samuel Gendler 2011-06-19 11:37:59 Re: Large rows number, and large objects
Previous Message Jim Nasby 2011-06-17 20:31:17 Re: seq scan in the case of max() on the primary key column