Re: Slow updates, poor IO

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "John Huttley" <John(at)mib-infotech(dot)co(dot)nz>
Cc: pgsql-performance(at)postgresql(dot)org, "Dan Langille" <dan(at)langille(dot)org>
Subject: Re: Slow updates, poor IO
Date: 2008-09-25 19:35:19
Message-ID: dcc563d10809251235h1fc04552v9b2a0ce7844f1e6f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Sep 25, 2008 at 1:24 PM, John Huttley <John(at)mib-infotech(dot)co(dot)nz> wrote:
> I've just had an interesting encounter with the slow full table update
> problem that is inherent with MVCC
>
> The system is 64 bit linux with 2.6.25 kernel feeding scsi disks.
>
> the table is
>
> CREATE TABLE file (
> fileid integer NOT NULL,
> fileindex integer DEFAULT 0 NOT NULL,
> jobid integer NOT NULL,
> pathid integer NOT NULL,
> filenameid integer NOT NULL,
> markid integer DEFAULT 0 NOT NULL,
> lstat text NOT NULL,
> md5 text NOT NULL,
> perms text
> );
>
> ALTER TABLE ONLY file
> ADD CONSTRAINT file_pkey PRIMARY KEY (fileid);
>
> CREATE INDEX file_fp_idx ON file USING btree (filenameid, pathid);
> CREATE INDEX file_jobid_idx ON file USING btree (jobid);
>
> There are 2.7M rows.
>
> running update file set perms='0664' took about 10 mins

So, how many rows would already be set to 0664? Would adding a where
clause speed it up?

update file set perms='0664' where perms <> '0664';

> during this period, vmstat reported Blocks Out holding in the 4000 to 6000
> range.
>
>
> When I dropped the indexes this query ran in 48sec.
> Blocks out peaking at 55000.
>
> So there is a double whammy.
> MVCC requires more work to be done when indexes are defined and then this
> work
> results in much lower IO, compounding the problem.

That's because it becomes more random and less sequential. If you had
a large enough drive array you could get that kind of performance for
updating indexes, since the accesses would tend to hit different
drives most the time.

Under heavy load on the production servers at work we can see 30 to 60
Megs a second random access with 12 drives, meaning 2.5 to 5Megs per
second per drive. Sequential throughput is about 5 to 10 times
higher.

What you're seeing are likely the effects of running a db on
insufficient drive hardware.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Alan Hodgson 2008-09-25 19:39:27 Re: Slow updates, poor IO
Previous Message John Huttley 2008-09-25 19:24:55 Slow updates, poor IO