Re: performance of bitmap scans in nested loop joins

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Sergey E(dot) Koposov" <math(at)sai(dot)msu(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: performance of bitmap scans in nested loop joins
Date: 2005-05-16 21:16:24
Message-ID: 21246.1116278184@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> ... Where we are losing is mostly on the actual manipulation
> of the bitmaps (particularly hash_seq_search which is done in
> tbm_begin_iterate; and it looks like memory allocation for the bitmap
> hashtables is nontrivial too). I had already had a TODO item to look
> into speeding up hash_seq_search ... will see what I can find.

I got around to looking at this some more. It seems that the basic
problem is that dynahash.c isn't optimized for very small hashtables.
The test case I'm looking at (which may be an oversimplification of
Sergey's problem) never has more than one entry in the hashtables it
creates for in-memory bitmaps, and so the hashtable overhead is pretty
significant. Particularly the overhead of creating and deleting 'em.

The other uses of dynahash.c that are at all performance-critical seem
to deal with hashtables that are (a) fairly large and (b) long-lived.
So I'm thinking that hacking dynahash.c itself to improve this situation
would probably be counterproductive overall.

An idea that comes to mind is to allow tidbitmap.c to handle the cases
of zero or one page entry directly, storing the page entry in a simple
field of the TIDBitmap struct. Only when the bitmap needs entries for
more than one heap page will we create a subsidiary hash table. (Note
that we could store bits for more than one tuple, if they happen to be
on the same heap page.)

This would uglify the logic in tidbitmap.c a bit, but it doesn't seem
completely preposterous. The main objection I can see is that it would
only optimize the zero-or-one-page cases, which might be insufficient.

Anyone have a better idea for speeding up operations on small bitmaps?

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-05-16 21:30:52 Re: SQL Request Size
Previous Message Hannu Krosing 2005-05-16 21:08:37 Re: SQL Request Size