An Analyze question

From: "Nick Fankhauser" <nickf(at)ontko(dot)com>
To: "pgsql-admin" <pgsql-admin(at)postgresql(dot)org>
Subject: An Analyze question
Date: 2002-04-22 15:16:42
Message-ID: NEBBLAAHGLEEPCGOBHDGIEDKENAA.nickf@ontko.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello-

Does analyze consider the entire key when creating statistics for a table,
or a substring composed of the leading chars?

Some background:

I've just modified *all* of the keys on my database by prefixing them with
the same source_id that is 5 chars long. This is in preparation for merging
together data from several different sources where I know that the keys are
unique to the source, but not necessarily between sources.

So on every primary & foreign key, I have executed this update:

update table set key = '18105'||key;

Now, a few queries that used to be swift are very slow, and on further
investigation, I found that the planner is making different decisions-
essentially it looks like the statistics now indicate that each key is much
less selective & hence a poor candidate for an index scan.

I did an analyze on the whole database, and then did an analyze specifically
on the tables involved, and checked against an original copy of the database
to make sure the indexes are identical in both.

If the keys are taken in their entirety, nothing has changed- they are just
as selective as ever. However, if only the leading chars are considered, or
if the leading chars have a higher weighting, they would certainly appear
much less selective.

Any thoughts on what happened here?

Thanks-

-Nick

PS: The before & after explains are pasted in below:

Before:

monroe=# explain select * from actor_cases where actor_id = '18105A7313 53';
NOTICE: QUERY PLAN:

Merge Join (cost=27748.94..27807.92 rows=145 width=192)
-> Sort (cost=27713.16..27713.16 rows=3410 width=144)
-> Nested Loop (cost=0.00..27372.75 rows=3410 width=144)
-> Index Scan using actor_case_assignment_both on
actor_case_assignment (cost=0.00..11766.67 rows=3410 width=24)
-> Index Scan using case_data_case_id on case_data
(cost=0.00..4.56 rows=1 width=120)
-> Sort (cost=35.78..35.78 rows=522 width=48)
-> Seq Scan on local_case_type (cost=0.00..12.22 rows=522
width=48)

After:

develop=# explain select * from actor_cases where actor_id = '18105A7313
53';
NOTICE: QUERY PLAN:

Hash Join (cost=27801.99..53031.15 rows=306 width=192)
-> Hash Join (cost=27788.47..51957.43 rows=11377 width=144)
-> Seq Scan on case_data (cost=0.00..6932.35 rows=226535
width=120)
-> Hash (cost=27693.03..27693.03 rows=11377 width=24)
-> Seq Scan on actor_case_assignment (cost=0.00..27693.03
rows=11377 width=24)
-> Hash (cost=12.22..12.22 rows=522 width=48)
-> Seq Scan on local_case_type (cost=0.00..12.22 rows=522
width=48)

--------------------------------------------------------------------------
Nick Fankhauser nickf(at)ontko(dot)com Phone 1.765.935.4283 Fax 1.765.962.9788
Ray Ontko & Co. Software Consulting Services http://www.ontko.com/

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2002-04-22 15:34:27 Re: An Analyze question
Previous Message Ferdinand Smit 2002-04-22 12:42:48 Re: Connection problem