Skip site navigation (1) Skip section navigation (2)

Re: Improving NOT IN

From: Jens-Wolfhard Schicke <ml+pgsql-hackers(at)asco(dot)de>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving NOT IN
Date: 2007-01-31 08:28:42
Message-ID: BF40A06E14817846EE9CDD4F@[] (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
--On Dienstag, Januar 30, 2007 23:24:40 +0000 Simon Riggs 
<simon(at)2ndquadrant(dot)com> wrote:
>> Basically what I see here is a whole lot of work and new executor
>> infrastructure for something that will be a win in a very narrow
>> use-case and a significant loss the rest of the time.  I think there
>> are more productive ways to spend our development effort.
Maybe one should not aim for a special case of continuous sequences. It 
might be a better thing to have a fast look-up datastructure for 
row-existence. The optimization over the usual indices is that only 
existence, and no other information must be saved, thus a bit-field is 
really possible. Even 100 Mio rows would fit in 10 MB.

So, instead of trying to find a sequence, find (or guess and later correct 
your bitfield) the minimum, and then set the bits as you encounter rows. 
During the join, test whether the bit you want to join to exists and voila, 
depending on whether you used IN or NOT IN, decide what to do.

This datastructure could be used everywhere where only existence is 
important and no columns of a table are selected.

Possibly, the bit-field should allow for large-gaps to be represented more 
efficiently, if you have an 32-bit index column, make a 256 entry top-level 
array pointing to bitfields representing the numbers 0x0-0x00ffffff, 
0x01000000 - 0x01ffffff... each such bitfield would need 2MB, the pointers 
are negligible. But now large holes in the sequence don't waste too much 
space and thus the minimum needs not to be known.

Jens Schicke
Jens Schicke		      j(dot)schicke(at)asco(dot)de
asco GmbH
Mittelweg 7		      Tel 0531/3906-127
38106 Braunschweig	      Fax 0531/3906-400

In response to

pgsql-hackers by date

Next:From: Hannu KrosingDate: 2007-01-31 09:08:00
Subject: Re: [HACKERS] pgsql: Fix for plpython functions; return true/falsefor boolean,
Previous:From: imadDate: 2007-01-31 07:15:40
Subject: Re: PL/pgSQL RENAME functionality in TODOs

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group