From: | Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Another try at reducing repeated detoast work for PostGIS |
Date: | 2009-08-22 11:14:48 |
Message-ID: | 4A8FD328.2030408@siriusit.co.uk |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
> Huh. As far as I can see this example should traverse the same code
> path. I was about to ask for the dataset, but I think you might have
> already sent it to me once --- does this look familiar?
>
> $ tar tvfj geography.tar.bz2
> -rw-r--r-- shade/shade 6444737 2008-06-06 13:33 geography.dbf
> -rw-r--r-- shade/shade 37179008 2008-06-06 13:33 geography.shp
> -rw-r--r-- shade/shade 263140 2008-06-06 13:33 geography.shx
>
> If so, what do I do with it exactly --- the file extensions convey
> nothing to my mind at the moment ...
Okay. I've gone back and had a look at the original queries, and it
seems the reason for the inflated times is that my setup for the
original database was based on PostGIS 1.3, which has a RECHECK applied
to the && operator.
If I temporarily remove the RECHECK from PostGIS 1.3 then the times drop
substantially:
postgis13=# explain analyze select count(*) from geography where
type='Z' and centroid && (select the_geom from geography where id=69495);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
---------
Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actual
time=394.183..394.184 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on geography (cost=0.00..6884.00 rows=1 width=4432)
(actual time=18.192..41.855 rows=1 loops=1)
Filter: (id = 69495::numeric)
-> Index Scan using idx_geography_centroid_z on geography
(cost=0.00..8.28 rows=1 width=0) (actual time=46.711..345.940 rows=29687
loops=1)
Index Cond: (centroid && $0)
Filter: ((type)::text = 'Z'::text)
Total runtime: 394.265 ms
(8 rows)
Incidentally, the recently-released PostGIS 1.4 has RECHECK disabled by
default, and so it can be seen that the times are reasonably similar:
postgis14=# explain analyze select count(*) from geography where
type='Z' and centroid && (select the_geom from geography where id=69495);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
---------
Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actual
time=396.314..396.315 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on geography (cost=0.00..6884.00 rows=1 width=4439)
(actual time=14.198..37.340 rows=1 loops=1)
Filter: (id = 69495::numeric)
-> Index Scan using idx_geography_centroid_z on geography
(cost=0.00..8.28 rows=1 width=0) (actual time=42.169..344.337 rows=29687
loops=1)
Index Cond: (centroid && $0)
Filter: ((type)::text = 'Z'::text)
Total runtime: 396.375 ms
(8 rows)
If I re-apply your patch to PostgreSQL 8.4 using PostGIS 1.4 (ignoring
RECHECK) then the results now look like this:
postgis14=# explain analyze select count(*) from geography where
type='Z' and centroid && (select the_geom from geography where id=69495);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
---------
Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actual
time=271.360..271.362 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on geography (cost=0.00..6884.00 rows=1 width=4439)
(actual time=17.534..32.009 rows=1 loops=1)
Filter: (id = 69495::numeric)
-> Index Scan using idx_geography_centroid_z on geography
(cost=0.00..8.28 rows=1 width=0) (actual time=32.393..165.057 rows=29687
loops=1)
Index Cond: (centroid && $0)
Filter: ((type)::text = 'Z'::text)
Total runtime: 271.428 ms
(8 rows)
postgis14=# explain analyze select count(*) from geography where
type='Z' and centroid && (select force_2d(the_geom) from geography where
id=69495);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
---------
Aggregate (cost=6892.28..6892.29 rows=1 width=0) (actual
time=272.091..272.092 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Seq Scan on geography (cost=0.00..6884.00 rows=1 width=4439)
(actual time=18.644..42.680 rows=1 loops=1)
Filter: (id = 69495::numeric)
-> Index Scan using idx_geography_centroid_z on geography
(cost=0.00..8.28 rows=1 width=0) (actual time=43.079..172.788 rows=29687
loops=1)
Index Cond: (centroid && $0)
Filter: ((type)::text = 'Z'::text)
Total runtime: 272.185 ms
(8 rows)
So in conclusion, I think that patch looks good and that the extra time
I was seeing was due to RECHECK being applied to the && operator, and
not the time being spent within the index scan itself.
ATB,
Mark.
--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063
Sirius Labs: http://www.siriusit.co.uk/labs
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2009-08-22 11:35:48 | Re: [PATCH] plpythonu datatype conversion improvements |
Previous Message | Caleb Welton | 2009-08-22 10:45:24 | Re: [PATCH] plpythonu datatype conversion improvements |