Re: again on index usage

From: Daniel Kalchev <daniel(at)digsys(dot)bg>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: again on index usage
Date: 2002-01-08 09:22:03
Message-ID: 200201080922.LAA02480@dcave.digsys.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>Tom Lane said:
> It's difficult to tell from this what it thinks the selectivity of the
> ipdate index would be, since the rows estimate includes the effect of
> the ipaddr and router restrictions. What do you get from just
>
> explain
> SELECT sum(input) FROM iplog_gate200112
> WHERE
> '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2001-12-02 00:00:00+02' A
ND
> '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2002-01-01 00:00:00+02';

Same result (sorry, should have included this originally):

Aggregate (cost=47721.72..47721.72 rows=1 width=8)
-> Seq Scan on iplog_gate200112 (cost=0.00..47579.54 rows=56873 width=8)

> If you say "set enable_seqscan to off", does that change the plan?

Yes. As expected (I no longer have the problem of NaN estimates :)

Aggregate (cost=100359.71..100359.71 rows=1 width=8)
-> Index Scan using iplog_gate200112_ipdate_idx on iplog_gate200112
(cost=0.00..100217.52 rows=56873 width=8)

My belief is that the planner does not want to use index due to low value
dispersion of the indexed attribute. When splitting the table into several
smaller tables, index is used.

This bites me, because each such query takes at least 3 minutes and the script
that generates these needs to execute few thousands queries.

> BTW, the planner does not associate function calls with indexes. If you
> want to have the ipaddr index considered for this query, you need to write
> ipaddr <<= '193.68.240.0/20' not network_subeq(ipaddr, '193.68.240.0/20').
> (But IIRC, that only works in 7.2 anyway, not earlier releases :-()

This is what I though too, but using the ipdate index will be sufficient.

I understand my complaint is not a bug, but rather question of proper planner
optimization (it worked 'as expected' in 7.0). Perhaps the planner should
consider the total number of rows, as well as the dispersion factor. With the
dispersion being around 1.5% and total rows 1.7 million this gives about 25k
rows with the same value - large enough to trigger sequential scan, as far as
I understand it, but the cost of scanning 1.7 million rows sequentially is
just too high.

By the way, the same query takes approx 10 sec with set enable_seqscan to off.

Daniel

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message mlw 2002-01-08 12:23:40 Re: (void *) with shmat
Previous Message Holger Krug 2002-01-08 09:21:13 Re: ON ERROR triggers