Re: GiST kNN search queue (Re: KNN-GiST with recheck)

From: Arthur Silva <arthurprs(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST kNN search queue (Re: KNN-GiST with recheck)
Date: 2014-12-10 20:59:43
Message-ID: CAO_YK0UWipBifVgNY1zqha8DJi84Kr=WEhDFLvneKo9Wzj2fBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 10, 2014 at 1:50 PM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 01/28/2014 04:12 PM, Alexander Korotkov wrote:
>
>> >3. A binary heap would be a better data structure to buffer the rechecked
>>> >values. A Red-Black tree allows random insertions and deletions, but in
>>> >this case you need to insert arbitrary values but only remove the
>>> minimum
>>> >item. That's exactly what a binary heap excels at. We have a nice binary
>>> >heap implementation in the backend that you can use, see
>>> >src/backend/lib/binaryheap.c.
>>> >
>>>
>> Hmm. For me binary heap would be a better data structure for KNN-GiST at
>> all :-)
>>
>
> I decided to give this a shot, replacing the red-black tree in GiST with
> the binary heap we have in lib/binaryheap.c. It made the GiST code somewhat
> simpler, as the binaryheap interface is simpler than the red-black tree
> one. Unfortunately, performance was somewhat worse. That was quite
> surprising, as insertions and deletions are both O(log N) in both data
> structures, but the red-black tree implementation is more complicated.
>
> I implemented another data structure called a Pairing Heap. It's also a
> fairly simple data structure, but insertions are O(1) instead of O(log N).
> It also performs fairly well in practice.
>
> With that, I got a small but measurable improvement. To test, I created a
> table like this:
>
> create table gisttest (id integer, p point);
> insert into gisttest select id, point(random(), random()) from
> generate_series(1, 1000000) id;
> create index i_gisttest on gisttest using gist (p);
>
> And I ran this query with pgbench:
>
> select id from gisttest order by p <-> '(0,0)' limit 1000;
>
> With unpatched master, I got about 650 TPS, and with the patch 720 TPS.
> That's a nice little improvement, but perhaps more importantly, the pairing
> heap implementation consumes less memory. To measure that, I put a
> MemoryContextStats(so->queueCtx) call into gistendscan. With the above
> query, but without the "limit" clause, on master I got:
>
> GiST scan context: 2109752 total in 10 blocks; 2088456 free (24998
> chunks); 21296 used
>
> And with the patch:
>
> GiST scan context: 1061160 total in 9 blocks; 1040088 free (12502 chunks);
> 21072 used
>
> That's 2MB vs 1MB. While that's not much in absolute terms, it'd be nice
> to reduce that memory consumption, as there is no hard upper bound on how
> much might be needed. If the GiST tree is really disorganized for some
> reason, a query might need a lot more.
>
>
> So all in all, I quite like this patch, even though it doesn't do anything
> too phenomenal. It adds a some code, in the form of the new pairing heap
> implementation, but it makes the GiST code a little bit simpler. And it
> gives a small performance gain, and reduces memory usage a bit.
>
> - Heikki
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
It may be better to replace the lib/binaryheap altogether as it offers
comparable/better performance.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matt Newell 2014-12-10 21:29:59 Re: libpq pipelining
Previous Message Heikki Linnakangas 2014-12-10 20:28:32 Re: advance local xmin more aggressively