Re: INSERT times - same storage space but more fields -> much slower inserts

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: INSERT times - same storage space but more fields -> much slower inserts
Date: 2009-04-14 15:56:14
Message-ID: 20090414155614.GL8123@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Craig,

* Craig Ringer (craig(at)postnewspapers(dot)com(dot)au) wrote:
> I've been doing some testing for the Bacula project, which uses
> PostgreSQL as one of the databases in which it stores backup catalogs.

We also use Bacula with a PostgreSQL backend.

> I've been evaluating a schema change for Bacula that takes a field
> that's currently stored as a gruesome-to-work-with base64-encoded
> representation of a binary blob, and expands it into a set of integer
> fields that can be searched, indexed, etc.

This would be extremely nice.

> The table size of the expanded form is marginally smaller than for the
> base64-encoded string version. However, INSERT times are *CONSIDERABLY*
> greater for the version with more fields. It takes 1011 seconds to
> insert the base64 version, vs 1290 seconds for the expanded-fields
> version. That's a difference of 279 seconds, or 27%.
>
> Despite that, the final table sizes are the same.
>
> If I use tab-separated input and COPY, the original-format file is
> 1300MB and the expanded-structure format is 1618MB. The performance hit
> on COPY-based insert is not as bad, at 161s vs 182s (13%), but still
> quite significant.
>
> Any ideas about what I might be able to do to improve the efficiency of
> inserting records with many integer fields?

Bacula should be using COPY for the batch data loads, so hopefully won't
suffer too much from having the fields split out. I think it would be
interesting to try doing PQexecPrepared with binary-format data instead
of using COPY though. I'd be happy to help you implement a test setup
for doing that, if you'd like.

Thanks,

Stephen

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Wakeling 2009-04-14 16:01:08 Re: INSERT times - same storage space but more fields -> much slower inserts
Previous Message Merlin Moncure 2009-04-14 13:46:28 Re: Nested query performance issue