Re: questions on interpreting the output of explain

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Olivier <molivier(at)yahoo(dot)com>
Cc: pgsql-sql(at)hub(dot)org
Subject: Re: questions on interpreting the output of explain
Date: 1999-02-21 20:36:23
Message-ID: 1430.919629383@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Michael Olivier <molivier(at)yahoo(dot)com> writes:
> Can someone explain the details here to me? I guess specific questions
> are:
> - - Does hash join mean my acctname indexes are hashed, not btree'd?
> - - What does index scan mean?
> - - Nested loop?

OK, I'll take a cut at this. (I've been studying the Postgres optimizer
lately, so I know a little bit about this, but I may be wrong about some
of the details.)

To Postgres, the most interesting part of your query is the WHERE clause
"U.acctname = S.acctname". This makes it a "join query": the system has
to find all combinations of U rows and S rows such that U.acctname =
S.acctname. The other clauses of your WHERE specification are
"restriction clauses": they eliminate rows from consideration.

Postgres has two basic ways of scanning an individual table to find the
desired rows: it can do a sequential scan of the whole table, or it can
scan an index and then go to the table proper for each row that looks
promising according to the index. Restriction clauses that reference
only the indexed field can be checked just on the basis of the index
entry, so it's not necessary to fetch the full row if it can be rejected
via a restriction clause. Index scan is faster than sequential if only
a few rows need be retrieved, but slower if many rows must be fetched.
(With some index types, such as btree, an index scan has another useful
property: the retrieved rows are inherently ordered by the indexed
fields. So an index scan might save an explicit sort step.)

OK, now what about joining multiple tables? The system knows three ways
to join two tables:

* Nested loop: scan one table, and for each candidate row retrieved from
it, scan the entire other table looking for matching rows. This is
pretty slow for big tables, but is the fallback if the system can't
figure out how to apply any faster technique. Note that the "scans"
might be implemented as either sequential or index scans, depending
on the availability of indexes matching restriction clauses; so even
a nested loop plan is not necessarily completely brute-force.

* Merge join: this is only useful when the join clause is an equality
check, a.f1 = b.f2. The idea is to scan both tables in parallel,
in sequence by the joined field (f1 for a, f2 for b). Matching rows
can be matched up on-the-fly using the same kind of logic you'd use
for merging two ordered lists. The trick is to scan the tables in
the right sorted order; that can be done with an index scan over an
ordered index, or by a sequential scan and an explicit sort step.

* Hash join: scan the smaller table once, entering each of its rows
(after restriction checks) into a temporary hashtable keyed by the join
field. Then scan the larger table once, probing into the hashtable to
find all matching rows. This is pretty good if one of the tables is
small enough to fit in memory conveniently, or if you expect the
restriction clauses to eliminate all but a few of its rows. Again, the
scans might be either sequential or indexed depending on restriction
clauses and whether the final output needs to be sorted or not.

If you have more than two tables mentioned in a query, they're joined
together in a sequence of pairwise join operations that produce
temporary tables.

Obviously there are a lot of possible ways to do any complicated query,
and the interactions aren't obvious. (For example, doing a lower-level
join in a way that produces a sorted result might make it possible to
apply a mergejoin later without a separate sort, yielding a cheaper
overall result even though the first join takes longer.) The system
uses statistics it keeps about table sizes and index properties to try
to estimate the "cost" of each possible way of doing the query, and then
it picks the way with the lowest cost estimate. Of course these
estimates aren't 100% accurate, especially not if the statistics are
out of date. (That's why you should run VACUUM ANALYZE regularly.)

> - - Does the cost roll-up, meaning the top line is total cost, and the
> rest is a breakdown, or do I add up all the cost numbers?
> - - Can I compare costs between two queries?

It rolls up, and comparing costs is what the whole thing is all about.

EXPLAIN will only show you the plan estimated to be cheapest overall.
If you'd like to see what was considered and rejected, you can turn on
optimizer debug printouts at system build time (but beware: they are
verbose).

> I get:

> Nested Loop (cost=9.38 size=1 width=28)
-> Index Scan on u (cost=3.26 size=3 width=16)
-> Index Scan on s (cost=2.04 size=25 width=12)

> ...but if I shorten the query slightly
> [ by omitting restriction clauses ]
> ...I get:

> Hash Join (cost=14.67 size=3 width=28)
-> Index Scan on u (cost=3.26 size=26 width=16)
-> Hash (cost=0.00 size=0 width=0)
-> Index Scan on s (cost=7.73 size=25 width=12)

I'm guessing that the extra restriction clauses made a nested loop look
better. The index scans in the first plan are probably chosen to allow
restriction clauses to be tested before fetching rows. (BTW, recent
releases include the name of the selected index in EXPLAIN output, which
makes it a good deal easier to tell what's going on.) With the extra
restrictions, the system estimated that few enough rows would be
selected that a nested loop would be cheapest. Note the "size" fields:
the system is guessing that only 3 U rows will be left after restriction
in the first case, so it will only have to repeat the inner scan 3
times. In the second case it guesses that 26 U rows will have to be
joined, so the extra overhead of preparing a hashtable is worth it.
These guesses could be all wet of course, but that's how the logic works.

A generalization of that comment is that on a small test table, you're
likely to get a nested loop plan no matter what. The startup overhead
of the more complex join types isn't worth it until the tables get to
realistic sizes (say a few hundred tuples) ... and the system's cost
models are good enough to know that.

regards, tom lane

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Bruce Momjian 1999-02-21 21:23:58 Re: [SQL] Re: questions on interpreting the output of explain\
Previous Message Werner Reisberger 1999-02-21 17:38:10 triggers (refint)