Re: Fwd: Raster performance

From: David Haynes II <dahaynes(at)umn(dot)edu>
To: PT <wmoran(at)potentialtech(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Fwd: Raster performance
Date: 2015-05-28 15:06:24
Message-ID: CAKNXh=os2wHF__4s=ZXotq7HzoL5Fut=2d+7xB+_UVVMWoCXFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Sorry,

The query run times are significantly slower on outdb as that using indb
here are the run times on 2 queries.

ST_Count/ST_Clip(select single band here)/Inner Join/ST_Transform (US
Counties)
OutDB: 873.564s (14 minutes 33s) InDB: 127.36s (2 minutes 7s)

ST_Count(select single band here)/ST_Clip(on all bands)/Inner
Join/ST_Transform (US Counties)
OutDB: 9537.371s (2 hours 38minutes) InDB: 310s (5 minutes 10 seconds)

In the query planner it shows a large change in the number of columns
(width) that are picked up in the CTE_rast_select.
These extra columns slow down the ability to process the data.

OUT DB
CTE rast_select
> -> Nested Loop (cost=0.28..76131.41 rows=62033 *width=1086)*
> -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250)

In DB
Nested Loop (cost=0.28..51767.41 rows=62033 *width=272*)
> -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250)

On Wed, May 27, 2015 at 4:31 PM, PT <wmoran(at)potentialtech(dot)com> wrote:

> On Tue, 26 May 2015 12:52:24 -0500
> David Haynes II <dahaynes(at)umn(dot)edu> wrote:
>
> > Hello,
> >
> > I have a question about the query optimizer and its performance on
> spatial
> > datasets, specifically rasters. My use case is rather unique, the
> > application that I am developing allows users to request summarizations
> of
> > various geographic boundaries around the world. Therefore our raster
> > datasets are global. We are in the process of conducting some benchmarks
> > for our system and we noticed something unexpected.
> >
> > The query is the same except the first is run on a raster (46gigs) in out
> > of database (outdb) and the second is the same raster (46gigs) stored in
> > database (indb). The raster is multibanded (13), with each band
> > representing one entire MODIS global scene. A single year of MODIS is
> > approximately 3.6 gigs.
> >
> > The outdb is being out performed by indb, because the query optimizer
> gets
> > smarter. But what is also interesting is all the extra pieces that are
> > brought in with outdb.
> >
> > with poly as
> > ( SELECT gid, label as name, ST_Transform(geom, 6842) as geom FROM
> > us_counties )
> > , rast_select as
> > ( SELECT r.rid as id, s.name , ST_CLIP(r.rast, s.geom) as rast from
> > rasters.glc2000 as r inner join poly as s on ST_Intersects(r.rast,
> s.geom) )
> > select r.id, r.name, ST_Count(r.rast, 1, True)
> >
> >
> > QUERY PLAN With Outdb
> >
> --------------------------------------------------------------------------------------------------
> > Sort (cost=93911.29..93926.80 rows=6204 width=254)
> > Sort Key: r.id, r.name
> > CTE poly
> > -> Seq Scan on us_counties (cost=0.00..112.86 rows=3109
> width=62247)
> > CTE rast_select
> > -> Nested Loop (cost=0.28..76131.41 rows=62033 width=1086)
> > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250)
> > -> Index Scan using modis_rast_gist on modis r_1
> > (cost=0.28..24.40 rows=2 width=836)
> > Index Cond: ((rast)::geometry && s.geom)
> > Filter: _st_intersects(s.geom, rast, NULL::integer)
> > -> HashAggregate (cost=17214.16..17276.20 rows=6204 width=254)
> > -> CTE Scan on rast_select r (cost=0.00..1240.66 rows=62033
> > width=254)
> >
> > QUERY PLAN With Indb
> >
> >
> -------------------------------------------------------------------------------------------------------------
> > Sort (cost=69547.29..69562.80 rows=6204 width=254)
> > Sort Key: r.id, r.name
> > CTE poly
> > -> Seq Scan on us_counties (cost=0.00..112.86 rows=3109
> width=62247)
> > CTE rast_select
> > -> Nested Loop (cost=0.28..51767.41 rows=62033 width=272)
> > -> CTE Scan on poly s (cost=0.00..62.18 rows=3109 width=250)
> > -> Index Scan using modis_noout_rast_gist on modis_noout r_1
> > (cost=0.28..16.56 rows=2 width=22)
> > Index Cond: ((rast)::geometry && s.geom)
> > Filter: _st_intersects(s.geom, rast, NULL::integer)
> > -> HashAggregate (cost=17214.16..17276.20 rows=6204 width=254)
> > -> CTE Scan on rast_select r (cost=0.00..1240.66 rows=62033
> > width=254)
>
> I could be missing something here, but I don't see how long the queries
> actually take to run. Have you actually run the queries and timed them?
> Keep in mind that analyze does not actually run the query, it only plans
> it, so the actual run time is unknown if all you do is analyze.
>
> The query plans appear to be equal, assuming there are slight variances
> in the names of tables from one DB to another (and I assume that your
> description of indb and outdb reflects the fact that there are (for
> reasons unknown) two copies of the data).
>
> The only purpose to those estimates is to choose a good plan. If the
> plan is bad for one database and both databases have the same data, then
> the plan will be bad for both.
>
> Since there have been no other responses, I'm guessing that others are
> confused by your question as well. Can you describe the actual problem
> that you're seeing?
>
> --
> Bill Moran <wmoran(at)potentialtech(dot)com>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
David Haynes, Ph.D.
Research Associate Terra Populus
Minnesota Population Center
www.terrapop.org

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ravi Krishna 2015-05-28 15:15:22 Partitioning and performance
Previous Message Robert Haas 2015-05-28 12:51:04 Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1