Re: Proximity query with GIST and row estimation

From: Paul Ramsey <pramsey(at)refractions(dot)net>
To: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
Cc: "Pg Performance list" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Proximity query with GIST and row estimation
Date: 2007-02-14 03:01:34
Message-ID: 0BD9E842-2DA7-4F90-A732-AFD60C1F50C7@refractions.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

You'll find that PostGIS does a pretty good job of selectivity
estimation.

P

On 13-Feb-07, at 9:09 AM, Guillaume Smet wrote:

> Hi all,
>
> Following the work on Mark Stosberg on this list (thanks Mark!), I
> optimized our slow proximity queries by using cube, earthdistance
> (shipped with contrib) and a gist index. The result is globally very
> interesting apart for a specific query and we'd like to be able to fix
> it too to be more consistent (it's currently faster with a basic
> distance calculation based on acos, cos and so on but it's slow
> anyway).
>
> The problem is that we have sometimes very few places near a given
> location (small city) and sometimes a lot of them (in Paris, Bruxelles
> and so on - it's the case we have here). The gist index I created
> doesn't estimate the number of rows in the area very well.
>
> Table: lieu (100k rows) with wgslat and wgslon as numeric
> Table: lieugelieu (200k rows, 1k with codegelieu = 'PKG')
> Index: "idx_lieu_earth" gist (ll_to_earth(wgslat::double precision,
> wgslon::double precision))
>
> The simplified query is:
> SELECT DISTINCT l.numlieu, l.nomlieu, ROUND
> (earth_distance(ll_to_earth(48.85957600, 2.34860800),
> ll_to_earth(l.wgslat, l.wgslon))) as dist
> FROM lieu l, lieugelieu lgl
> WHERE lgl.codegelieu = 'PKG' AND earth_box(ll_to_earth(48.85957600,
> 2.34860800), 1750) @ ll_to_earth(l.wgslat, l.wgslon) AND lgl.numlieu =
> l.numlieu ORDER BY dist ASC LIMIT 2;
> It's used to find the nearest car parks from a given location.
>
> The plan is attached plan_earthdistance_nestedloop.txt. It uses a
> nested loop because the row estimate is pretty bad: (cost=0.00..3.38
> rows=106 width=0) (actual time=30.229..30.229 rows=5864 loops=1).
>
> If I disable the nested loop, the plan is different and faster (see
> plan_earthdistance_hash.txt attached).
>
> Is there any way to improve this estimation? I tried to set the
> statistics of wgslat and wgslon higher but it doesn't change anything
> (I don't know if the operator is designed to use the statistics).
>
> Any other idea to optimize this query is very welcome too.
>
> --
> Guillaume
> <plan_earthdistance_nestedloop.txt>
> <plan_earthdistance_hash.txt>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Bruce Momjian 2007-02-14 05:01:27 Re: [PERFORM] Direct I/O issues
Previous Message Heikki Linnakangas 2007-02-13 22:52:54 Re: quad or dual core Intel CPUs