Combining index scans

From: Victor Yegorov <viy(at)mits(dot)lv>
To: Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Combining index scans
Date: 2005-01-31 19:15:59
Message-ID: 20050131191559.GC5500@mits.lv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello again.

This time I'd like to speak about in-memory bitmap to combine index scan
results. I know, this code should use minimal amount of memory, so I really
want to hear any possible pros and cons.

Below, pseudocode is given. After running it, we'll have a list of CTID
pointers, and one bitmap for each clause, that planner marked for index scan.

Consider query with m clauses, each marked for index scan. Let:
- j denotes number of the clause, 0 <= j < m, m is total number of clauses;
- R total number of CTID returned by all calls to index scans so far (for
any of the m clauses), i.e. CTIDs list cardinality;
- p CTID position in the list, 0 <= p < R;
- c CTID returned by index scan API.

for (j = 0; j < m; j++) {
if (no_bitmap_for_caluse(j)) {
/* create bitmap of length R */
create_bitmap(j, R);
for (i = 0; i < R; i++)
/* set i-th bit of j-th bitmap to 0 */
set_bitmap_bit(j, i, 0);
}
c = get_ctid_from_index_scan_api();

/* search for CTID in the list */
p = search_ctid(c);

/*
* p will be either -1 if CTIDs not seen so far
* OR it'll be in the range 0 <= p < R
*/
if (p == -1) {
/* this will also increase R (total number of ctids) */
add_ctid_to_the_list(c);

/* position of just-added CTID entry in list */
p = R;

/*
* update all bitmaps created so far,
* setting bit for just-added entry to 0
*/
for (i = 0; i < j; i++) {
set_bitmap_bit(i, p, 0);
}
}

/* update current bitmap, set bit p to 1 */
set_bitmap_bit(j, p, 1);
}

After that, one could do AND/OR of bitmaps and return the list of CTIDs that
matches final results.

For the case of 3 clauses, we could have such data in memoy:
idx | CTID idx | clause1 | clause2 | clause3
1 | 1234 1 | 1 | 0 | 0
2 | 3456 2 | 0 | 1 | 0
3 | 5678 3 | 0 | 0 | 1

As you can see, there were 3 scans, each returned exactly 1 CTID.
If there were at list one AND in the WHERE clause, then no rows will be
returned.

After inserting some more data to the table, we could have such more entry in
our in-memory bitmaps:

idx | CTID idx | clause1 | clause2 | clause3
... ...
3 | 5678 3 | 0 | 1 | 1

So, in case of AND present in the WHERE clause, CTID 5678 would have been
returned.

That's it.
Waiting for your feedback.

--

Victor Y. Yegorov

Browse pgsql-hackers by date

  From Date Subject
Next Message Vishal Kashyap @ [SaiHertz] 2005-01-31 19:25:55 Re: Last ID Problem
Previous Message operationsengineer1 2005-01-31 19:13:58 Last ID Problem