Quick Links

Re: again on index usage

From:	Daniel Kalchev <daniel(at)digsys(dot)bg>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: again on index usage
Date:	2002-01-08 09:22:03
Message-ID:	200201080922.LAA02480@dcave.digsys.bg
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

>>>Tom Lane said:
> It's difficult to tell from this what it thinks the selectivity of the
> ipdate index would be, since the rows estimate includes the effect of
> the ipaddr and router restrictions. What do you get from just
>
> explain
> SELECT sum(input) FROM iplog_gate200112
> WHERE
> '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2001-12-02 00:00:00+02' A
ND
> '2001-12-01 00:00:00+02' <= ipdate AND ipdate < '2002-01-01 00:00:00+02';

Same result (sorry, should have included this originally):

Aggregate (cost=47721.72..47721.72 rows=1 width=8)
-> Seq Scan on iplog_gate200112 (cost=0.00..47579.54 rows=56873 width=8)

> If you say "set enable_seqscan to off", does that change the plan?

Yes. As expected (I no longer have the problem of NaN estimates :)

Aggregate (cost=100359.71..100359.71 rows=1 width=8)
-> Index Scan using iplog_gate200112_ipdate_idx on iplog_gate200112
(cost=0.00..100217.52 rows=56873 width=8)

My belief is that the planner does not want to use index due to low value
dispersion of the indexed attribute. When splitting the table into several
smaller tables, index is used.

This bites me, because each such query takes at least 3 minutes and the script
that generates these needs to execute few thousands queries.

> BTW, the planner does not associate function calls with indexes. If you
> want to have the ipaddr index considered for this query, you need to write
> ipaddr <<= '193.68.240.0/20' not network_subeq(ipaddr, '193.68.240.0/20').
> (But IIRC, that only works in 7.2 anyway, not earlier releases :-()

This is what I though too, but using the ipdate index will be sufficient.

I understand my complaint is not a bug, but rather question of proper planner
optimization (it worked 'as expected' in 7.0). Perhaps the planner should
consider the total number of rows, as well as the dispersion factor. With the
dispersion being around 1.5% and total rows 1.7 million this gives about 25k
rows with the same value - large enough to trigger sequential scan, as far as
I understand it, but the cost of scanning 1.7 million rows sequentially is
just too high.

By the way, the same query takes approx 10 sec with set enable_seqscan to off.

Daniel

Responses

Re: again on index usage at 2002-01-08 14:37:11 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	mlw	2002-01-08 12:23:40	Re: (void *) with shmat
Previous Message	Holger Krug	2002-01-08 09:21:13	Re: ON ERROR triggers