Re: Optimizing a huge_table/tiny_table join

From: "Kynn Jones" <kynnjo(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Optimizing a huge_table/tiny_table join
Date: 2006-05-25 16:21:53
Message-ID: c2350ba40605250921m1d83299ah72ec8b4da31d281@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 5/24/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> <kynn(at)panix(dot)com> writes:
> > Limit (cost=19676.75..21327.99 rows=6000 width=84)
> > -> Hash Join (cost=19676.75..1062244.81 rows=3788315 width=84)
> > Hash Cond: (upper(("outer".id)::text) =
> upper(("inner".id)::text))
> > -> Seq Scan on huge_table h (cost=0.00..51292.43 rows=2525543
> width=46)
> > -> Hash (cost=19676.00..19676.00 rows=300 width=38)
> > -> Seq Scan on tiny_table t (cost=0.00..19676.00rows=300 width=38)
>
> Um, if huge_table is so much bigger than tiny_table, why are the cost
> estimates for seqscanning them only about 2.5x different? There's
> something wacko about your statistics, methinks.

You mean there's a bug in explain? I agree that it makes no sense that the
costs don't differ as much as one would expect, but you can see right there
the numbers of rows for the two tables, which are exactly as I described.
At any rate, how would one go about finding an explanation for these strange
stats?

More bewildering still (and infuriating as hell--because it means that all
of my work for yesterday has been wasted) is that I can no longer reproduce
the best query plan, even though the tables have not changed at all. (Hence
I can't post the explain analyze for the best query plan.) No matter what
value I use for LIMIT, the query planner now insists on sequentially
scanning huge_table and ignoring the available index. (If I turn off
enable_seqscan, I get the second worst query plan I posted yesterday.)

Anyway, I take it that there is no way to bypass the optimizer and instruct
PostgreSQL exactly how one wants the search performed?

Thanks!

kj

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2006-05-25 16:26:43 Re: lowering priority automatically at connection
Previous Message Chris Mair 2006-05-25 16:16:24 lowering priority automatically at connection