Re[2]: planner/optimizer hash index method and so on

From: "Alexey V(dot) Meledin" <avm(at)webclub(dot)ru>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re[2]: planner/optimizer hash index method and so on
Date: 2000-04-05 09:20:23
Message-ID: 11555.000405@webclub.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi,
Tom!

Monday, April 03, 2000, 6:32:06 PM, you wrote:

>> 2. I've replace btree indexes on relation
>> AND atd.ifs_data_id = def.ifs_data_id;
>> with:
>> create index itd_hash_data_id on ifs_tree_default USING HASH (ifs_data_id);
>> create index iatd_hash_data_id on ifs_account_tree_data USING HASH (ifs_data_id);

TL> Why would you do that? The hash index method doesn't have any advantage
TL> over btree that I can see, and it's got a lot of disadvantages.
But as I understand from documentation, Hash Join is the preferable
method of JOIN, because of Seq Scan performed only once (I suppose, that
it's not full table scan!?!).
Nested Joins performs too many scans for results in JOIN, containing
many rows and are so slow on large tables (planner/optimizer problem?).
Thereby I've found that it's more efficient to make several queries
from hard one (planner/optimizer explains me up to 8 neseted loops on it :().

Additional questions:
1. What is the difference among "Seq Scan" and "Index Scan" in
a planer/optimizer query execution plan?
Seq Scan - "sequential data scan" or "sequential scan, based on
"serial" field"?

2. I have
create table aaa
(a int4, b int4, c int4, d int4,
CONSTRAINT aaa_pk PRIMARY KEY(a,b,c)
);

create table bbb (
a int4, b int4, c int4, g serial, f int4
,CONSTRAINT bbb_pk PRIMARY KEY (a, b, c, g));

create index aaa_index_a on aaa (a);
create index aaa_index_b on aaa (b);
create index aaa_index_c on aaa (c);
create index aaa_index_d on aaa (d);

--create index bbb_index_fk on bbb (a,b,c); --FK
create index bbb_index_a on bbb (a);
create index bbb_index_b on bbb (b);
create index bbb_index_c on bbb (c);

!PLEASE, verify indexes (I'm not shure at 100%)

explain select b.f from bbb b, aaa a
where a.b=50001
and a.c=50002
and a.d=50003
-- at this point I have "aaa" rows for join
-- performing jon
and a.b=b.b
and a.c=b.c
and a.a=b.a;
Nested Loop (cost=4.05 rows=1 width=28)
-> Index Scan using aaa_index_b on aaa a (cost=2.05 rows=1 width=12)
-> Index Scan using bbb_pk on bbb b (cost=2.00 rows=61984 width=16)

explain select b.f from bbb b, aaa a
where a.b=50001
and a.c=50002
and a.d=50003
-- at this point I have "aaa" rows for join
and b.b=2
and b.c=3
-- at this point I have "bbb" rows for join
-- performing jon
and a.a=b.a;
Nested Loop (cost=4.05 rows=1 width=12)
-> Index Scan using aaa_index_b on aaa a (cost=2.05 rows=1 width=4)
-> Index Scan using bbb_pk on bbb b (cost=2.00 rows=1 width=8)

=================================
Two queries are almost equal, but in the first I perform join by wide
PK key. In the second only one key is explicity used in JOIN.

So, the difference is in ROWS and WIDTH:
PS: Each table has 100000 rows.
1. Nested Loop (cost=4.05 rows=1 width=28)
-> Index Scan using aaa_index_b on aaa a (cost=2.05 rows=1 width=12)
-> Index Scan using bbb_pk on bbb b (cost=2.00 rows=61984 width=16)
2. Nested Loop (cost=4.05 rows=1 width=12)
-> Index Scan using aaa_index_b on aaa a (cost=2.05 rows=1 width=4)
-> Index Scan using bbb_pk on bbb b (cost=2.00 rows=1 width=8)

So, questions:
1. can I say, that the second query is more optimal then 1-st?

2. Variants I try:
2.1. When I use no indexes, then:
2.1.1. Nested Loop (cost=2442.50 rows=1 width=28)
-> Seq Scan on aaa a (cost=2440.50 rows=1 width=12)
-> Index Scan using bbb_pk on bbb b (cost=2.00 rows=61984 width=16)
2.1.2. Nested Loop (cost=2442.50 rows=1 width=12)
-> Seq Scan on aaa a (cost=2440.50 rows=1 width=4)
-> Index Scan using bbb_pk on bbb b (cost=2.00 rows=1 width=8)

2.2. I try to set indexes on a.a and b.a, supposing that it helps a
bit on relation "a.a=b.a"
2.2.1. Nested Loop (cost=3355.28 rows=1 width=28)
-> Seq Scan on aaa a (cost=3353.28 rows=1 width=12)
-> Index Scan using bbb_pk on bbb b (cost=2.00 rows=85159 width=16)
2.2.2. Nested Loop (cost=3355.28 rows=1 width=12)
-> Seq Scan on aaa a (cost=3353.28 rows=1 width=4)
-> Index Scan using bbb_pk on bbb b (cost=2.00 rows=1 width=8)

Perfomace become lower!! Why?

2.3. I've set only this indexes:
create index aaa_index_bcd on aaa (b,c,d);
create index bbb_index_bc on bbb (b,c);
and take the most better perfomance:
2.3.1. Nested Loop (cost=4.00 rows=1 width=28)
-> Index Scan using aaa_index_bcd on aaa a (cost=2.00 rows=1 width=12)
-> Index Scan using bbb_pk on bbb b (cost=2.00 rows=85159 width=16)
2.3.2. Nested Loop (cost=4.00 rows=1 width=12)
-> Index Scan using aaa_index_bcd on aaa a (cost=2.00 rows=1 width=4)
-> Index Scan using bbb_pk on bbb b (cost=2.00 rows=1 width=8)

2.4. I add indexes on a.a and a.b
2.4.1. Nested Loop (cost=4.00 rows=1 width=28)
-> Index Scan using aaa_index_bcd on aaa a (cost=2.00 rows=1 width=12)
-> Index Scan using bbb_pk on bbb b (cost=2.00 rows=85159 width=16)
2.4.2. Nested Loop (cost=4.00 rows=1 width=12)
-> Index Scan using aaa_index_bcd on aaa a (cost=2.00 rows=1 width=4)
-> Index Scan using bbb_pk on bbb b (cost=2.00 rows=1 width=8)

Nothing happens!
Is it occurs because PostgreSQL makes JOIN on a.a and b.a in memory?
Or Use of wide PK is more prefferable than simple index?

Can anybody collect results of this small experiment and make right
deduction?

Regards, Alexey V. Meledin
InterForge Developers Group, Saint-Petersburg
look_to: <www.etcompany.ru><www.businessweb.ru>
<www.inplan.spb.ru><www.pia.ru>>>>>>>>>>>>>>>>>
mail_to: <avm(at)webclub(dot)ru><nick_as: <cureman>>>>

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Kovacs Zoltan Sandor 2000-04-05 09:43:53 Re: Foreign Key Implementation
Previous Message Moray McConnachie 2000-04-05 08:50:54 Re: Foreign Key Implementation