Re: Bucket and batch

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Ana Carolina Brito de Almeida" <anacrl(at)ig(dot)com(dot)br>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bucket and batch
Date: 2008-06-30 23:54:44
Message-ID: 20577.1214870084@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Ana Carolina Brito de Almeida" <anacrl(at)ig(dot)com(dot)br> writes:
> So, I would like to know the differences between bucket and batch.

A bucket is, well, one bucket of a hash table --- it holds all the
tuples that have the same hash code (for as many bits of the hash
code as we are choosing to use). We try to size the hash table with
enough buckets so there's not more than 10 tuples per bucket on
average.

A batch is a range of buckets that we process at the same time. Tuples
(from either side of the join) whose hash codes show they fall into
batches other than the first one get dumped into temporary holding
files, and then (after finishing joining the first batch) we pull each
successive batch back into memory and join that portion of the tuples.
The batch size is chosen to make the amount of memory needed be
approximately work_mem.

IOW, there are really nbuckets * nbatches "virtual" buckets in the hash
table, but only nbuckets worth of them are kept in memory at any one
time.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-07-01 00:27:40 Re: Vacuuming leaked temp tables (once again)
Previous Message Tom Lane 2008-06-30 23:45:35 Re: Does anything dump per-database config settings? (was Re: ALTER DATABASE vs pg_dump)