Re: Explain explained

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Markus Stocker" <markus(at)wilabs(dot)ch>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Explain explained
Date: 2008-03-04 18:26:17
Message-ID: 17042.1204655177@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

"Markus Stocker" <markus(at)wilabs(dot)ch> writes:
> 1/ How does postgresql know that the constrain individual_name.name =
> 'http://www.University0.edu' matches one row (which is in fact true)?

It doesn't *know* that, except in the special case where it sees there's
a unique index on that column. You didn't show us the table definitions
so it's not clear if that's the case or not. Otherwise it's just
estimating based on the pg_stats statistics for the column. There's
a new section in the 8.3 docs that gives some more details about the
estimation process:
http://www.postgresql.org/docs/8.3/static/planner-stats-details.html

> Further, why expects postgresql
> for each operation a row size of 1?

It's not expecting more than one join partner at each step. Again
that's just statistical.

> 2/ Sequential scans seem to me more expensive compared to index scans.
> I'm wondering why the sequential scan on individual_name is the first
> executed in the plan.

I was wondering that too; it looks like it should be a candidate for an
index search. Datatype problem maybe? Again, you've not shown us the
table definitions...

> 3/ There is one condition in the query, i.e. concept_assertion.concept
> = 5 with an empty result set, i.e. selectivity 0. In fact, the last
> index scan on concept_assertion ca_1 in the plan is never executed
> (this is what 'explain analyze' tells me). I'm wondering, why this
> constrain is not executed first. By executing this first, we could
> just never execute everything else.

Postgres never uses a rowcount estimate of less than one row,
so it's not going to try to optimize that way. There are various
rationales behind that choice, but the main one is that we don't
trust the statistics unreservedly. The odds that an estimate of zero
is more accurate than an estimate of one just aren't good enough,
and the likelihood of generating a really awful plan if we did believe
zero rows is too high. (In fact, I've been considering whether it
wouldn't be a good idea to enforce a minimum estimate of two rows
whenever we don't see a unique index proving there can be only one.
This would help discourage the planner from using nestloops in cases
where a nestloop loses badly for more than one matching row.)

regards, tom lane

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Markus Stocker 2008-03-04 20:52:30 Re: Explain explained
Previous Message Rodrigo E. De León Plicet 2008-03-04 15:21:07 Re: Explain explained