Re: Large rows number, and large objects

From: Jose Ildefonso Camargo Tolosa <ildefonso(dot)camargo(at)gmail(dot)com>
To: Craig James <craig_james(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Large rows number, and large objects
Date: 2011-06-20 02:19:24
Message-ID: BANLkTi=2tGT+aR13bihXfoOM4UkVS+36Qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi!

Thanks (you both, Samuel and Craig) for your answers!

On Sun, Jun 19, 2011 at 11:19 AM, Craig James
<craig_james(at)emolecules(dot)com> wrote:
> On 6/19/11 4:37 AM, Samuel Gendler wrote:
>
> On Sat, Jun 18, 2011 at 9:06 PM, Jose Ildefonso Camargo Tolosa
> <ildefonso(dot)camargo(at)gmail(dot)com> wrote:
>>
>> Greetings,
>>
>> I have been thinking a lot about pgsql performance when it is dealing
>> with tables with lots of rows on one table (several millions, maybe
>> thousands of millions).  Say, the Large Object use case:
>>
>> one table has large objects (have a pointer to one object).
>> The large object table stores the large object in 2000 bytes chunks
>> (iirc), so, if we have something like 1TB of data stored in large
>> objects, the large objects table would have something like 550M rows,
>> if we get to 8TB, we will have 4400M rows (or so).
>>
>> I have read at several places that huge tables should be partitioned,
>> to improve performance.... now, my first question comes: does the
>> large objects system automatically partitions itself? if no: will
>> Large Objects system performance degrade as we add more data? (I guess
>> it would).
>
> You should consider "partitioning" your data in a different way: Separate
> the relational/searchable data from the bulk data that is merely being
> stored.
>
> Relational databases are just that: relational.  The thing they do well is
> to store relationships between various objects, and they are very good at
> finding objects using relational queries and logical operators.
>
> But when it comes to storing bulk data, a relational database is no better
> than a file system.
>
> In our system, each "object" is represented by a big text object of a few
> kilobytes.  Searching that text file is essential useless -- the only reason
> it's there is for visualization and to pass on to other applications.  So
> it's separated out into its own table, which only has the text record and a
> primary key.

Well, my original schema does exactly that (I mimic the LO schema):

files (searchable): id, name, size, hash, mime_type, number_chunks
files_chunks : id, file_id, hash, chunk_number, data (bytea)

So, my bulk data is on files_chunks table, but due that data is
restricted (by me) to 2000 bytes, the total number of rows on the
files_chunks table can get *huge*.

So, system would search the files table, and then, search the
files_chunks table (to get each of the chunks, and, maybe, send them
out to the web client).

So, with a prospect of ~4500M rows for that table, I really thought it
could be a good idea to partition files_chunks table. Due that I'm
thinking on relatively small files (<100MB), table partitioning should
do great here, because I could manage to make all of the chunks for a
table to be contained on the same table. Now, even if the system
were to get larger files (>5GB), this approach should still work.

The original question was about Large Objects, and partitioning...
see, according to documentation:
http://www.postgresql.org/docs/9.0/static/lo-intro.html

"All large objects are placed in a single system table called pg_largeobject."

So, the question is, if I were to store 8TB worth of data into large
objects system, it would actually make the pg_largeobject table slow,
unless it was automatically partitioned.

Thanks for taking the time to discuss this matter with me!

Sincerely,

Ildefonso Camargo

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vladimir Kulev 2011-06-20 05:35:34 Inoptimal query plan for max() and multicolumn index
Previous Message Pierre C 2011-06-19 23:05:32 Re: Degrading PostgreSQL 8.4 write performance