From: | Steve Lane <slane(at)moyergroup(dot)com> |
---|---|
To: | <pgsql-admin(at)postgresql(dot)org> |
Subject: | Re: Index (re)-creation speed |
Date: | 2004-02-06 07:03:48 |
Message-ID: | BC489874.9895%slane@moyergroup.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
On 2/6/04 12:30 AM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Steve Lane <slane(at)moyergroup(dot)com> writes:
>> When I drop and rebuild the indexes, they take oddly varying amounts of time
>> to rebuild. I rebuilt them in the following order, with the following rough
>> times. I took a guess that the speed of the rebuild might be related to the
>> number of distinct values in the column -- this seems true in some cases but
>> not in others. Here are the times:
>
>> id_response 38 secs (86000 distinct)
>> id_topic 33 secs (6 distinct)
>> id_survey 13 secs (1 distinct)
>> id_code 39 secs (1444 distinct)
>> id_item 40 secs (65 distinct)
>> id_administration 13 secs (1 distinct)
>
> How many rows altogether in this table? What sort_mem are you using
> (and have you tried altering it)?
Sorry, left out important info. I cleaned the table first (it's a temporary
copy), then imported one run of data, 86,000 rows.
I haven't checked sort_mem, will do so.
>
> When I read your previous message I was wondering about different
> datatypes having different comparison costs, but since these are all
> integer columns that's clearly not the issue here. I think you may
> be seeing something related to the number of initial runs created in the
> sorting algorithm --- with only one distinct value, there'd always be
> just one run and no need for any merge passes.
>
> (Why are you bothering to index columns with only one or a few values in
> them, anyway? Such an index will be useless for searches ...)
Again, I left out some useful details. Id_survey and id_administration will
be identical for a single run of data (which can generate 10-100K inserts).
And my tests were just one data set, imported into a cleaned table. Still,
the distinctness of these columns will be low -- on the order of one
distinct value per 10^4 rows. The others have levels of distinctness
proportional to what the above chart shows -- id_response is unique,
id_topic will have 4-8 values per 10^4 records, and so on.
And still, none of this explains to me why the index on id_item drove the
COPY cost up so apparently dramatically. I tried the COPY again both with
and without that one index and in both cases it caused COPY to go from 5 to
40+ seconds.
-- sgl
From | Date | Subject | |
---|---|---|---|
Next Message | neelesh vijaivargia | 2004-02-06 09:07:42 | Data location in PostgreSQL |
Previous Message | Steve Lane | 2004-02-06 06:54:01 | Re: Drop indexes inside transaction? |