Re: 10+hrs vs 15min because of just one index

From: Aaron Turner <synfinatic(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: 10+hrs vs 15min because of just one index
Date: 2006-02-12 19:33:57
Message-ID: 1ca1c1410602121133y279ea488nc38f049ac6ebc7f4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 2/12/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Aaron Turner <synfinatic(at)gmail(dot)com> writes:
> > Well before I go about re-architecting things, it would be good to
> > have a strong understanding of just what is going on. Obviously, the
> > unique index on the char(48) is the killer. What I don't know is:
>
> You have another unique index on the integer primary key, so it's not
> the mere fact of a unique index that's hurting you.

Understood. I just wasn't sure if in general unique indexes are some
how more expensive then non-unique indexes.

> > 1) Is this because the column is so long?
>
> Possibly. Allowing for 12 bytes index-entry overhead, the char keys
> would be 60 bytes vs 16 for the integer column, so this index is
> physically almost 4x larger than the other. You might say "but that
> should only cause 4x more I/O" but it's not necessarily so. What's
> hard to tell is whether you are running out of RAM disk cache space,
> resulting in re-reads of pages that could have stayed in memory when
> dealing with one-fifth as much index data. You did not show us the
> iostat numbers for the two cases, but it'd be interesting to look at
> the proportion of writes to reads on the data drive in both cases.

Sounds a lot like what Marc mentioned.

> > 2) Is this because PG is not optimized for char(48) (maybe it wants
> > powers of 2? or doesn't like even numbers... I don't know, just
> > throwing it out there)
>
> Are the key values really all 48 chars long? If not, you made a
> bad datatype choice: varchar(n) (or even text) would be a lot
> smarter. char(n) wastes space on blank-padding.

Yep, everything exactly 48. Looks like I'll be storing it as a bytea
in the near future though.

> The only one of these effects that looks to me like it could result in
> worse-than-linear degradation of I/O demand is maxing out the available
> RAM for disk cache. So while improving the datatype choice would
> probably be worth your while, you should first see if fooling with
> shared_buffers helps, and if not it's time to buy RAM not disk.

Yeah, that's what it's beginning to sound like. Thanks Tom.

--
Aaron Turner
http://synfin.net/

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Adnan DURSUN 2006-02-12 20:25:28 SQL Function Performance
Previous Message Aaron Turner 2006-02-12 19:04:37 Re: 10+hrs vs 15min because of just one index