Re: 10+hrs vs 15min because of just one index

From: Aaron Turner <synfinatic(at)gmail(dot)com>
To: hubert depesz lubaczewski <depesz(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: 10+hrs vs 15min because of just one index
Date: 2006-02-10 16:35:49
Message-ID: 1ca1c1410602100835w3a1f85a3tc6373ee7f4d13943@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 2/10/06, hubert depesz lubaczewski <depesz(at)gmail(dot)com> wrote:
> On 2/10/06, Aaron Turner <synfinatic(at)gmail(dot)com> wrote:
> > So I'm trying to figure out how to optimize my PG install (8.0.3) to
> > get better performance without dropping one of my indexes.
> > Basically, I have a table of 5M records with 3 columns:
> > pri_key (SERIAL)
> > data char(48)
> > groupid integer
> > there is an additional unique index on the data column.
> > The problem is that when I update the groupid column for all the
> > records, the query takes over 10hrs (after that I just canceled the
> > update). Looking at iostat, top, vmstat shows I'm horribly disk IO
> > bound (for data not WAL, CPU 85-90% iowait) and not swapping.
> > Dropping the unique index on data (which isn't used in the query),
>
> for such a large update i would suggest to go with different scenario:
> split update into packets (10000, or 50000 rows at the time)
> and do:
> update packet
> vacuum table
> for all packets. and then reindex the table. should work much nicer.

The problem is that all 5M records are being updated by a single
UPDATE statement, not 5M individual statements. Also, vacuum can't
run inside of a transaction.

On a side note, is there any performance information on updating
indexes (via insert/update) over the size of the column? Obviously,
char(48) is larger then most for indexing purposes, but I wonder if
performance drops linerally or exponentially as the column width
increases. Right now my column is hexidecimal... if I stored it as a
binary representation it would be smaller.

Thanks,
Aaron

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew T. O'Connor 2006-02-10 17:13:35 Re: 10+hrs vs 15min because of just one index
Previous Message Gourish Singbal 2006-02-10 15:41:00 Re: pgbench output