First of all, thank you for your fast answer, Kevin :) .

However I still wonder if on the search into the hashed table (stored in the RAM, as you're pointing out), it checks for fathers as many times as students are selected, or if the engine uses some kind of intelligent heuristic to avoid searching for the same father more than once.

For example:

students
----------------------------------------
id_student | name | id_father
----------------------------------------
1 | James | 1
2 | Laura | 2
3 | Anthony | 1

fathers (hashed table into RAM)
----------------------------------------
id_father | name
----------------------------------------
1 | John
2 | Michael

According to how I understood the process, the engine would get the name from the student with ID 1 and would look for the name of the father with ID 1 in the hashed table. It'd do exactly the same with the student #2 and father #2. But my big doubt is about the 3rd one (Anthony). Would the engine "know" that it already had retrieved the father's name for the student 1 and would avoid searching for it into the hashed table (using some kind of internal mechanism which allows to "re-utilize" the name)? Or would it search into the hashed table again?

Thanks a lot for your patience :) .

Kevin Grittner wrote:

negora <negora@negora.com> wrote:

I've a doubt about how the PostgreSQL planner makes a hash join.

Let's suppose that I've 2 tables, one of students and the other
one of parents in a many-to-one relation. I want to do something
like this:

        SELECT s.complete_name, f.complete_name
        FROM students AS s
        JOIN fathers AS f ON f.id_father = s.id_father;

Using the ANALYZE command, I've checked that the planner firstly
scans and extracts the required information from "fathers", builds
a temporary hash table from it, then scans "students", and finally
joins the information from this table and the temporary one
employing the relation "f.id_father = s.id_father".

 
This sort of plan is sometimes used when the optimizer expects the
hash table to fit into RAM, based on statistics and your work_mem
setting.  If it does fit, that's one sequential scan of the father
table's heap, and a hashed lookup into RAM to find the father to
match each student.  For the sort of query you're showing, that's
typically a very good plan.
 
-Kevin