Re: BLOB updates -> database size explodes

From: Stephen Scheck <singularsyntax(at)gmail(dot)com>
To: Dimitar Misev <dimitarmisev(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: BLOB updates -> database size explodes
Date: 2013-05-29 22:49:29
Message-ID: CAKjnHz1V+SK7hHPgA8FHKDuo7PMekCui3=jevbYgGGJr05dX+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

This is just a guess (I haven't dug into the low-level page/disk access
Postgres code for Large Objects yet but if I'm right, the LO-based project
I'm working on will likely face the same issues you're seeing), but LOs
enjoy transactional behavior just like anything else (as far as I can tell
from my testing) and so are subject to MVCC effects. Since LOs are opaque
to Postgres and it can't infer anything about their structure, even
flipping a single bit in a LO causes whatever page that bit maps to be
marked invalid (as if the page corresponded exactly to one row in a normal
table), and the page copied to a new one along with your change(s).

If this hypothesis is correct, doing a vacuum should free up dead pages and
your size expectations should be more accurate. And if that's the case
putting more intelligence into the application could mitigate some of the
update growth (predicting what page temporally similar updates will go to
and grouping them into a single transaction, for instance).

On Tue, May 28, 2013 at 2:53 PM, Dimitar Misev <dimitarmisev(at)gmail(dot)com>wrote:

> I'm having some issue with BLOB updates (via ECPG). The total blobs size
> should be ~280MB, but after partially updating all of them for 150 times
> the size on disk grows up from 184MB to 18GB.
>
> In more details:
>
> There are 608 blobs of size 460800 bytes. All blobs are updated piecewise
> in 150 repetitions; so first all blobs are updated in bytes 0 - 3071, then
> 3072 - 6143, etc. In the end on the disk the database is 18GB.
>
> Computing the size of pg_largeobject gives me 267MB:
>
> SELECT pg_size_pretty(count(loid) * 2048) FROM pg_largeobject;
> pg_size_pretty
> ----------------
> 267 MB
>
> On the other hand, this gives me 18GB, and du -sh on the disk also reports
> 18.4GB:
>
> SELECT tablename,
> pg_size_pretty(size) AS size_pretty,
> pg_size_pretty(total_size) AS total_size_pretty
> FROM (SELECT *, pg_relation_size(schemaname||'**.'||tablename) AS size,
> pg_total_relation_size(**schemaname||'.'||tablename)
> AS total_size
> FROM pg_tables) AS TABLES
> WHERE TABLES.tablename = 'pg_largeobject'
> ORDER BY total_size DESC;
> tablename | size_pretty | total_size_pretty
> ----------------+-------------**+-------------------
> pg_largeobject | 18 GB | 18 GB
>
>
> Doing these updates takes 85 minutes on quad-core i7 with 6GB RAM and SSD
> hard disk. This is PostgreSQL 8.4.12 on Debian 64 bit.
>
> Anyone knows what's going on here?
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-general<http://www.postgresql.org/mailpref/pgsql-general>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Chris Travers 2013-05-30 01:17:06 feedback request (Perl/Pg developers)
Previous Message Stephen Frost 2013-05-29 20:14:02 Re: Introduction