Re: [HACKERS] nested loops in joins, ambiguous rewrite rules

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Charles Hornberger <charlie(at)k4azl(dot)net>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] nested loops in joins, ambiguous rewrite rules
Date: 1999-01-30 21:07:07
Message-ID: 5397.917730427@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Charles Hornberger <charlie(at)k4azl(dot)net> writes:
> Hello again. I've got a few more notes on this "strange" optimizer
> behavior, which might be useful to anyone interested in this problem:

Well, I understand at least part of the "problem" here.

First, you're assuming that a merge-join plan is necessarily better than
a nested-loop plan. That should be true for large queries, but it is
*not* necessarily true for small tables --- when there are only a few
tuples in the tables being scanned, a simple nested loop wins because it
has much less startup overhead. (Or at least that's what our optimizer
thinks; I have not tried to measure this for myself.)

What's really going on here is that when the optimizer *knows* how small
your test tables are, it deliberately chooses nested-loop as being the
fastest thing. If it doesn't know, it makes some default assumptions
about the sizes of the tables, and with those default sizes it decides
that merge-join will be cheaper.

So the next question is why apparently similar database situations yield
different states of optimizer knowledge. The answer is that creating
an index before inserting tuples and creating it afterwards have
different side effects on the optimizer's statistics.

I've only spent about ten minutes looking into this, so my understanding
is no doubt incomplete, but what I've found out is:
1. There are a couple of statistical fields in the system pg_class
table, namely relpages and reltuples (number of disk pages and
tuples in each relation).
2. There are per-attribute statistics kept in the pg_statistic table.
3. The pg_class statistics fields are updated by a VACUUM (with or
without ANALYZE) *and also by CREATE INDEX*. Possibly by other
things too ... but plain INSERTs and so forth don't update 'em.
4. The pg_statistics info only seems to be updated by VACUUM ANALYZE.

So if you do
CREATE TABLE
CREATE INDEX
insert tuples
then the state of play is that the optimizer thinks the table is empty.
(Or, perhaps, it makes its default assumption --- pg_class doesn't seem
to have any special representation for "size of table unknown" as
opposed to "size of table is zero", so maybe the optimizer treats
reltuples = 0 as meaning it should use a default table size. I haven't
looked at that part of the code to find out.)

But if you do
CREATE TABLE
insert tuples
CREATE INDEX
the state of play is that the optimizer thinks there are as many tuples
in the table as there were when you created the index. This explains
the varying behavior in your detailed test case.

I'll bet that if you investigate the contents of pg_class and
pg_statistic in the two databases you were originally working with,
you'll find that they are different. But a VACUUM ANALYZE should
bring everything into sync, assuming that the actual data in the
databases is the same.

At any rate, if you still think there's something flaky going on,
please have a look at what's happening in these statistical fields;
that'll give us more info about whether there's really a bug or not.

Also, if the optimizer still wants to use nested loop when it knows
there are a *lot* of tuples to be processed, there might be a bug in
its equations for the cost of these operations --- how large are your
tables?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-01-31 00:11:54 Reducing sema usage (was Postmaster dies with many child processes)
Previous Message The Hermit Hacker 1999-01-30 20:34:29 Re: [DOCS] Re: postgres v6.0 and pgsql-docs-digest V1 #401