Re: Internal operations when the planner makes a hash join.

From: negora <negora(at)negora(dot)com>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Internal operations when the planner makes a hash join.
Date: 2010-02-23 16:39:40
Message-ID: 4B8404CC.4060607@negora.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<font face="Verdana">First of all, thank you for your fast answer,
Kevin :) .<br>
<br>
However I still wonder if on the search into the hashed
table (stored in the RAM, as you're pointing out), it checks for
fathers as
many times as students are selected, or if the engine uses some kind of
intelligent heuristic to avoid searching for the same father more than
once.<br>
<br>
For example:<br>
<br>
students<br>
----------------------------------------<br>
id_student | name | id_father<br>
----------------------------------------<br>
1 | James | 1<br>
2 | Laura | 2<br>
3 | Anthony | 1<br>
<br>
<br>
fathers (hashed table into RAM)<br>
</font><font face="Verdana">----------------------------------------<br>
</font><font face="Verdana">id_father</font><font face="Verdana"> | name<br>
----------------------------------------<br>
1 | John<br>
2 | Michael<br>
<br>
<br>
According to how I understood the process, the engine would get the
name from the student with ID 1 and would look for the name of the
father with ID 1 in the hashed table. It'd do exactly the same with the
student #2 and father #2. But my big doubt is about the 3rd one
(Anthony). Would the engine "know" that it already had retrieved the
father's name for the student 1 and would avoid searching for it into
the hashed table (using some kind of internal mechanism which allows to
"re-utilize" the name)? Or would it search into the hashed table again?<br>
<br>
Thanks a lot for your patience :) .<br>
<br>
</font><br>
Kevin Grittner wrote:
<blockquote cite="mid:4B83A03E020000250002F4FD(at)gw(dot)wicourts(dot)gov"
type="cite">
<pre wrap="">negora <a class="moz-txt-link-rfc2396E"
href="mailto:negora(at)negora(dot)com">&lt;negora(at)negora(dot)com&gt;</a> wrote:

</pre>
<blockquote type="cite">
<pre wrap="">I've a doubt about how the PostgreSQL planner makes a hash join.
</pre>
</blockquote>
<pre wrap=""><!---->
</pre>
<blockquote type="cite">
<pre wrap="">Let's suppose that I've 2 tables, one of students and the other
one of parents in a many-to-one relation. I want to do something
like this:

SELECT s.complete_name, f.complete_name
FROM students AS s
JOIN fathers AS f ON f.id_father = s.id_father;

Using the ANALYZE command, I've checked that the planner firstly
scans and extracts the required information from "fathers", builds
a temporary hash table from it, then scans "students", and finally
joins the information from this table and the temporary one
employing the relation "f.id_father = s.id_father".
</pre>
</blockquote>
<pre wrap=""><!---->
This sort of plan is sometimes used when the optimizer expects the
hash table to fit into RAM, based on statistics and your work_mem
setting. If it does fit, that's one sequential scan of the father
table's heap, and a hashed lookup into RAM to find the father to
match each student. For the sort of query you're showing, that's
typically a very good plan.

-Kevin

</pre>
</blockquote>
</body>
</html>

Attachment Content-Type Size
unknown_filename text/html 3.1 KB

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alvaro Herrera 2010-02-23 16:53:39 Re: Internal operations when the planner makes a hash join.
Previous Message Kevin Grittner 2010-02-23 15:30:38 Re: Internal operations when the planner makes a hash join.