Re: Doubt with physical storage being used by postgres when storing LOBs

From: Víctor Cosqui <victor(dot)cosqui(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Doubt with physical storage being used by postgres when storing LOBs
Date: 2013-10-11 08:54:50
Message-ID: CACg6vWXy_84ShCCXzNCsz9xLfWnx5sZvQRU6aNcrR0c3XW1Bxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello again,

I have been digging a bit more into this and I have seen that the increase
of the table size is not caused by the index, but by the data storage to
disk implementation. As it is explained at the documentation of
pg_largeobject (
http://www.postgresql.org/docs/9.2/static/catalog-pg-largeobject.html)

"Each row of pg_largeobject holds data for one page of a large object,
beginning at byte offset (pageno * LOBLKSIZE) within the object. The
implementation allows sparse storage: pages might be missing, and might be
shorter than LOBLKSIZE bytes even if they are not the last page of the
object. Missing regions within a large object read as zeroes."

I could confirm by reading the file in disk where the table is stored that
when I put a file whose content is like:

[aaa…abbb…bccc…c]

What I can read from the file in disk is like

[aaa…000bbb…000ccc…0000]

So pages are not completely fulfilled with object data and filled with
zeroes to reach the LOBLKSIZE.

I have also playing with different LOBLKSIZE values to see what happens
with smaller and bigger values and the conclusion is that the bigger
LOBLKSIZE is, the more space is filled with zeroes.

This is what happens when you store a 17MB compressed file in different
LOBLKSIZE conditions.
(Default value for LOBLKSIZE is 2KB)

------------------------------------------------+-------------+----------+-------------
LOBLKSIZE [B] | 512 | 2048 |
8000 |
------------------------------------------------+-------------+----------+-------------
Rows in pg_largeobject table | 34000 | 8712 | 2231 |
------------------------------------------------+-------------+----------+-------------
Size of pg_largeobject table [MB] | 21 | 24 |
28 |
------------------------------------------------+-------------+----------+-------------
Overhead [%] | 23.53 | 41.18 |
64.71 |
------------------------------------------------+-------------+----------+-------------

IMHO the overhead added by these zeroes is too high.

Thanks for your support and regards!

On Wed, Oct 2, 2013 at 9:00 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Wed, Oct 2, 2013 at 11:08 AM, Víctor Cosqui <victor(dot)cosqui(at)gmail(dot)com>
> wrote:
> > Hi all
> >
> >
> > I am developing an application which uses postgres 9.2 to store binaries
> as
> > oid objects.
> >
> > CREATE TABLE content (contentname text, contentoid oid);
> >
> > I am making some tests to evaluate how much HD space I will need to
> allocate
> > these objects.
> >
> > To measure the space used by postgres I have used two different tools,
> both
> > with the same results
> >
> > 1.- Checking physical HD space by making a "sudo du -sb
> > /opt/PostgreSQL/9.2/data/base/" before and after inserting the data
> >
> > 2.- Asking directly postgres about the tables size estimation "select
> > pg_size_pretty(pg_relation_size('pg_largeobject'))"
> >
> > I have tested with different binaries and I am getting different results,
> > for example when I put the content of a zipped file of 17MB size, the
> > increment of the disk space is of 24MB. The reason for this increment
> seems
> > to be an index created on the table "pg_largeobject". The index is
> > "pg_largeobject_loid_pn_index"
> >
> > In other hand when I put let's say many zeroes (same 17Mb) the increase
> of
> > HD usage is much smaller.
> >
> > I think it could be caused because TOAST compresses the content stored,
> se
> > he can compress the zeroes but not the previously compressed zip content.
> >
> > My question is: Is this increase of ~40% normal? Has someone else
> > experienced this?
>
> TOAST will compress data if it thinks it can (you can disable this
> behavior and arguably should if your data is pre-compressed). 40% for
> the index seems high but it may be accurate. Personally, I prefer
> bytea storage to LOB although LOB is a little bit faster.
>
> merlin
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Albe Laurenz 2013-10-11 09:23:28 Re: Incorrect index being used
Previous Message Tom Lane 2013-10-11 08:04:13 Re: postgresql9.1.6 core dump