Re: [GENERAL] Re: [SQL] bad select performance fixed by forbidding hash joins

From: Leon <leon(at)udmnet(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] Re: [SQL] bad select performance fixed by forbidding hash joins
Date: 1999-07-21 18:40:44
Message-ID: 3796142C.B577425D@udmnet.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-sql

Tom Lane wrote:
>
> The right long-term solution is to figure out why the system is
> misestimating the relative costs of the two plans, and fix the cost
> estimates. (The system is estimating that the mergejoin is about 4x
> slower than hash; if it's really 8x faster, there is something pretty
> broken about the estimate...)
>

I am pleased that someone is taking seriously the speed of joins problem,
and here is a word or two of mine to say.

The main feature of SQL as a language is it's generality. It doesn't
tell how to make things. It only expresses the result desired. This is
a strong side, because programmer doesn't have to tell the procedure,
and it's at the same time the weakness. This is weakness because
the problem of deciding on a way is handed over to the server, since
there seems theoretically no way to estimate the cost of every query
without doing that query. It is proven very easily. Suppose you
have a join with qualification 'where a = 10'. Having statistics
of how many there are rows with a == 10 you can tell the right way.
Then suppose you have join with qualification 'table1.a = table2.b'.
This is much much more difficult to be estimated in the right way,
although still possible. If you have qualification of the kind
'SOME_WEIRD_FUNCTION(table1.a) = ANOTHER_WEIRD_FUNCTION(table2.b)'
it becomes impossible to tell the right way.

There is a fundamental flaw in SQL which makes it difficult
to realize in practice with high efficiency. The only way to
solve the problem generally is to commit a sin against purism
and start giving hints to server.

I don't state that there is no way of improving concrete Postgres
optimizer. Definitely there is. As far as I can see, now it's
optimizer is too pessimistic about sizes of the results of joins,
it overestimates them badly. If you make it think that it is
going to get fewer rows from a query, it will behave better. Maybe
you should even get a state variable, setting of which could
regulate it's degree of optimism in joins. But all that is quick
and dirty hack. General way is hints, however.

--
Leon.
---------
"This may seem a bit weird, but that's okay, because it is weird." -
Perl manpage.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Vadim Mikheev 1999-07-22 01:24:56 Re: [GENERAL] OID IS INVALID?
Previous Message Bruce Momjian 1999-07-21 17:57:35 Re: [GENERAL] Re: [SQL] bad select performance fixed by forbidding hash joins

Browse pgsql-sql by date

  From Date Subject
Next Message Don Yury 1999-07-22 04:21:42 how to determine table?
Previous Message Bruce Momjian 1999-07-21 17:57:35 Re: [GENERAL] Re: [SQL] bad select performance fixed by forbidding hash joins