Quick Links

Re: Improve PostGIS performance with 62 million rows?

From:	Paul Ramsey <pramsey(at)cleverelephant(dot)ca>
To:	Israel Brewster <israel(at)ravnalaska(dot)net>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Improve PostGIS performance with 62 million rows?
Date:	2017-01-05 17:50:49
Message-ID:	CACowWR1kDoj7FTtP5=fE=yaP6V7cumnYYt8-9SVdG2VtpHHQjA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

The index filters using bounding boxes. A long, diagonal route will have a
large bounding box, relative to the area you actually care about (within a
narrow strip of the route). Use ST_Segmentize() to add points to your
route, ST_DumpPoints() to dump those out as point and ST_MakeLine to
generate new lines from those points, each line very short. The maximum
index effectiveness will come when your line length is close to your buffer
width.

On Thu, Jan 5, 2017 at 9:45 AM, Israel Brewster <israel(at)ravnalaska(dot)net>
wrote:

> I have a database (PostgreSQL 9.6.1) containing 62,702,675 rows of
> latitude (numeric), longitude(numeric), elevation(integer) data, along with
> a PostGIS (2.3.0) geometry column (location), running on a CentOS 6.8 box
> with 64GB RAM and a RAID10 SSD data drive. I'm trying to get the maximum
> elevation along a path, for which purpose I've come up with the following
> query (for one particular path example):
>
> SELECT elevation FROM data
>
>
>
>
> WHERE ST_DWithin(location, ST_GeographyFromText('SRID=4326;LINESTRING(-150.008056
> 61.179167,-156.77 71.285833)'), 600)
>
>
>
> ORDER BY elevation LIMIT 1;
>
> The EXPLAIN ANALYZE output of this particular query (
> https://explain.depesz.com/s/heZ) shows:
>
>
>
> QUERY PLAN
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------
> Limit (cost=4.83..4.83 rows=1 width=4) (actual time=22653.840..22653.842
> rows=1 loops=1)
> -> Sort (cost=4.83..4.83 rows=1 width=4) (actual
> time=22653.837..22653.837 rows=1 loops=1)
> Sort Key: elevation DESC
> Sort Method: top-N heapsort Memory: 25kB
> -> Index Scan using location_gix on data (cost=0.42..4.82
> rows=1 width=4) (actual time=15.786..22652.041 rows=11081 loops=1)
> Index Cond: (location && '0102000020E6100000020000002C11
> A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography)
> Filter: (('0102000020E6100000020000002C11
> A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography
> && _st_expand(location, '600'::double precision)) AND
> _st_dwithin(location, '0102000020E6100000020000002C11
> A8FE41C062C0DFC2BAF1EE964E40713D0AD7A39863C086C77E164BD25140'::geography,
> '600'::double precision, true))
> Rows Removed by Filter: 4934534
> Planning time: 0.741 ms
> Execution time: 22653.906 ms
> (10 rows)
>
> So it is using the index properly, but still takes a good 22 seconds to
> run, most of which appears to be in the Index Scan.
>
> Is there any way to improve this, or is this going to be about as good as
> it gets with the number of rows being dealt with? I was planning to use
> this for a real-time display - punch in a couple of points, get some
> information about the route between, including maximum elevation - but with
> it taking 22 seconds for the longer routes at least, that doesn't make for
> the best user experience.
>
> It's perhaps worth noting that the example above is most likely a worst
> case scenario. I would expect the vast majority of routes to be
> significantly shorter, and I want to say the shorter routes query much
> faster [testing needed]. That said, the faster the better, even for short
> routes :-)
> -----------------------------------------------
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> -----------------------------------------------
>
>
>
>
>
>

In response to

Improve PostGIS performance with 62 million rows? at 2017-01-05 17:45:09 from Israel Brewster

Responses

Re: Improve PostGIS performance with 62 million rows? at 2017-01-05 19:36:23 from Israel Brewster

Browse pgsql-general by date

	From	Date	Subject
Next Message	Adrian Klaver	2017-01-05 18:46:59	Re: COPY: row is too big
Previous Message	Israel Brewster	2017-01-05 17:45:09	Improve PostGIS performance with 62 million rows?