Re: Strange issue with GiST index scan taking far too long

From: "A(dot) Kretschmer" <andreas(dot)kretschmer(at)schollglas(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Strange issue with GiST index scan taking far too long
Date: 2008-06-09 14:24:08
Message-ID: 20080609142408.GA9409@a-kretschmer.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

am Mon, dem 09.06.2008, um 14:18:50 +0100 mailte Mark Cave-Ayland folgendes:
> Hi there,
>
> I'm currently looking at a bug report in PostGIS where we are getting
> extremely long index scan times querying an index in one case, but the
> same scan can take much less time if the input geometry is calculated as
> the result of another function.
>
> First of all, I include the EXPLAIN ANALYZE of the basic query which
> looks like this:
>
>
> postgis=# explain analyze select count(*) from geography where centroid
> && (select the_geom from geography where id=69495);
>
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=7157.29..7157.30 rows=1 width=0) (actual
> time=2691.783..2691.784 rows=1 loops=1)
> InitPlan
> -> Seq Scan on geography (cost=0.00..7149.00 rows=1 width=4559)
> (actual time=60.987..61.373 rows=1 loops=1)
> Filter: (id = 69495::numeric)
> -> Index Scan using geography_geom_centroid_idx on geography
> (cost=0.00..8.28 rows=1 width=0) (actual time=79.241..2645.722
> rows=32880 loops=1)
> Index Cond: (centroid && $0)
> Filter: (centroid && $0)
> Total runtime: 2692.288 ms
> (8 rows)
>

There are a BIG difference between estimated rows and real rows (1 vs.
32880). Why? Do you have recent statistics?

>
> The only real thing to know about the query is that the id field within
> the geography table is a primary key, and hence only a single geometry

Sure? I can't believe this because the rows=32880.

> is being returned from within the subselect. Note that most of the time
> is disappearing into the index scan.
>
> Where things start getting really strange is when we add an extra
> function called force_2d() into the mix. All this function does is scan
> through the single geometry returned from the subselect and remove any
> 3rd dimension coordinates. Now the resulting EXPLAIN ANALYZE for this
> query looks like this:
>
>
> postgis=# explain analyze select count(*) from geography where centroid
> && (select force_2d(the_geom) from geography where id=69495);
>
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=7157.29..7157.30 rows=1 width=0) (actual
> time=343.004..343.005 rows=1 loops=1)
> InitPlan
> -> Seq Scan on geography (cost=0.00..7149.00 rows=1 width=4559)
> (actual time=48.714..49.016 rows=1 loops=1)
> Filter: (id = 69495::numeric)
> -> Index Scan using geography_geom_centroid_idx on geography
> (cost=0.00..8.28 rows=1 width=0) (actual time=49.367..235.296 rows=32880
> loops=1)
> Index Cond: (centroid && $0)
> Filter: (centroid && $0)
> Total runtime: 343.084 ms
> (8 rows)
>
>
> So by adding in an extra function around the subselect result, we have
> speeded up the index lookup by several orders of magnitude, and the

Wild guess: you have a big cache.

But i'm not a PostGIS-Expert.

Regards, Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2008-06-09 14:34:17 Re: Message-ID should surely not be shown as a mailto: URL
Previous Message Andrew Dunstan 2008-06-09 14:00:29 Re: pg_dump restore time and Foreign Keys