Re: PoC: Partial sort

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC: Partial sort
Date: 2013-12-14 12:54:23
Message-ID: 20131214125423.GE25520@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Cool stuff.

On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote:
> Currently when we need to get ordered result from table we have to choose
> one of two approaches: get results from index in exact order we need or do
> sort of tuples. However, it could be useful to mix both methods: get
> results from index in order which partially meets our requirements and do
> rest of work from heap.

> ------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=69214.06..69214.08 rows=10 width=16) (actual
> time=0.097..0.099 rows=10 loops=1)
> -> Sort (cost=69214.06..71714.06 rows=1000000 width=16) (actual
> time=0.096..0.097 rows=10 loops=1)
> Sort Key: v1, v2
> Sort Method: top-N heapsort Memory: 25kB
> -> Index Scan using test_v1_idx on test (cost=0.42..47604.42
> rows=1000000 width=16) (actual time=0.017..0.066 rows=56 loops=1)
> Total runtime: 0.125 ms
> (6 rows)

Is that actually all that beneficial when sorting with a bog standard
qsort() since that doesn't generally benefit from data being pre-sorted?
I think we might need to switch to a different algorithm to really
benefit from mostly pre-sorted input.

> *partial-knn-1.patch*
>
> KNN-GiST provides ability to get ordered results from index, but this order
> is based only on index information. For instance, GiST index contains
> bounding rectangles for polygons, and we can't get exact distance to
> polygon from index (similar situation is in PostGIS). In attached patch,
> GiST distance method can set recheck flag (similar to consistent method).
> This flag means that distance method returned lower bound of distance and
> we should recheck it from heap.

> See an example.
>
> create table test as (select id, polygon(3+(random()*10)::int,
> circle(point(random(), random()), 0.0003 + random()*0.001)) as p from
> generate_series(1,1000000) id);
> create index test_idx on test using gist (p);
>
> We can get results ordered by distance from polygon to point.
>
> postgres=# select id, p <-> point(0.5,0.5) from test order by p <->
> point(0.5,0.5) limit 10;

> ----------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.29..1.86 rows=10 width=36) (actual time=0.180..0.230
> rows=10 loops=1)
> -> Index Scan using test_idx on test (cost=0.29..157672.29
> rows=1000000 width=36) (actual time=0.179..0.228 rows=10 loops=1)
> Order By: (p <-> '(0.5,0.5)'::point)
> Total runtime: 0.305 ms
> (4 rows)

Rechecking from the heap means adding a sort node though, which I don't
see here? Or am I misunderstanding something?
Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-12-14 13:11:59 Re: Time-Delayed Standbys
Previous Message Andres Freund 2013-12-14 12:42:10 Re: "stuck spinlock"