Re: Inserting 8MB bytea: just 25% of disk perf used?

From: Dave Crooke <dcrooke(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: "fkater(at)googlemail(dot)com" <fkater(at)googlemail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Inserting 8MB bytea: just 25% of disk perf used?
Date: 2010-01-16 00:03:56
Message-ID: ca24673e1001151603v10841eees597f3c9d35f4dc9a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I'd second this .... a database is doing all kinds of clever things to
ensure ACID consistency on every byte that gets written to it.

If you don't need that level of consistency for your 8MB blobs, write them
to plain files named with some kind of id, and put the id in the database
instead of the blob. This will reduce the amount of disk I/O for storing
each blob by nearly 50%, and will reduce marshaling overheads by a larger
magin.

From your account, it sounds like the database is performing nicely on that
hardware ... 16MB/sec to a raw disk or filesystem is rather slow by modern
standards, but 16MB/sec of database updates is pretty good for having
everything on one slow-ish spindle.

On Fri, Jan 15, 2010 at 3:15 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Thu, Jan 14, 2010 at 9:29 AM, fkater(at)googlemail(dot)com
> <fkater(at)googlemail(dot)com> wrote:
> > Hello together,
> >
> > I need to increase the write performance when inserting
> > bytea of 8MB. I am using 8.2.4 on windows with libpq.
> >
> > The test setting is simple:
> >
> > I write 100x times a byte array (bytea) of 8 MB random data
> > into a table having a binary column (and oids and 3 other
> > int columns, oids are indexed). I realized that writing 8 MB
> > of 0-bytes is optimized away. With random data, the disk
> > space now is filled with 800MB each run as expected. I use a
> > transaction around the insert command.
> >
> > This takes about 50s, so, 800MB/50s = 16MB/s.
> >
> > However the harddisk (sata) could write 43 MB/s in the worst
> > case! Why is write performance limited to 16 MB/s?
> >
> >
> > Some more hints what I do:
> >
> > I use PQexecParams() and the INSERT ... $001 notation to NOT
> > create a real escapted string from the data additionally but
> > use a pointer to the 8MB data buffer.
> >
> > I altered the binary column to STORAGE EXTERNAL.
> >
> > Some experiments with postgresql.conf (fsync off,
> > shared_buffers=1000MB, checkpoint_segments=256) did not
> > change the 50s- much (somtimes 60s sometimes a little less).
> >
> > 4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.
> >
> >
> > Do you have any further idea why 16MB/s seems to be the
> > limit here?
>
> postgres is simply not geared towards this type of workload. 16mb
> isn't too bad actually, and I bet you could significantly beat that
> with better disks and multiple clients sending data, maybe even close
> to saturate a gigabit line. However, there are other ways to do this
> (outside the db) that are more appropriate if efficiency is a big
> concern.
>
> merlin
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kenneth Marshall 2010-01-16 00:14:40 Re: Bad plan choice nestloop vs. hashjoin
Previous Message Kevin Grittner 2010-01-15 22:58:57 Re: Bad plan choice nestloop vs. hashjoin