Re: Index (re)-creation speed

From: Steve Lane <slane(at)moyergroup(dot)com>
To: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Index (re)-creation speed
Date: 2004-02-06 07:03:48
Message-ID: BC489874.9895%slane@moyergroup.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 2/6/04 12:30 AM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Steve Lane <slane(at)moyergroup(dot)com> writes:
>> When I drop and rebuild the indexes, they take oddly varying amounts of time
>> to rebuild. I rebuilt them in the following order, with the following rough
>> times. I took a guess that the speed of the rebuild might be related to the
>> number of distinct values in the column -- this seems true in some cases but
>> not in others. Here are the times:
>
>> id_response 38 secs (86000 distinct)
>> id_topic 33 secs (6 distinct)
>> id_survey 13 secs (1 distinct)
>> id_code 39 secs (1444 distinct)
>> id_item 40 secs (65 distinct)
>> id_administration 13 secs (1 distinct)
>
> How many rows altogether in this table? What sort_mem are you using
> (and have you tried altering it)?

Sorry, left out important info. I cleaned the table first (it's a temporary
copy), then imported one run of data, 86,000 rows.

I haven't checked sort_mem, will do so.
>
> When I read your previous message I was wondering about different
> datatypes having different comparison costs, but since these are all
> integer columns that's clearly not the issue here. I think you may
> be seeing something related to the number of initial runs created in the
> sorting algorithm --- with only one distinct value, there'd always be
> just one run and no need for any merge passes.
>
> (Why are you bothering to index columns with only one or a few values in
> them, anyway? Such an index will be useless for searches ...)

Again, I left out some useful details. Id_survey and id_administration will
be identical for a single run of data (which can generate 10-100K inserts).
And my tests were just one data set, imported into a cleaned table. Still,
the distinctness of these columns will be low -- on the order of one
distinct value per 10^4 rows. The others have levels of distinctness
proportional to what the above chart shows -- id_response is unique,
id_topic will have 4-8 values per 10^4 records, and so on.

And still, none of this explains to me why the index on id_item drove the
COPY cost up so apparently dramatically. I tried the COPY again both with
and without that one index and in both cases it caused COPY to go from 5 to
40+ seconds.

-- sgl

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message neelesh vijaivargia 2004-02-06 09:07:42 Data location in PostgreSQL
Previous Message Steve Lane 2004-02-06 06:54:01 Re: Drop indexes inside transaction?