Re: Inefficient queryplan for query with intersectable

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>
Cc: Richard Huxton <dev(at)archonet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Inefficient queryplan for query with intersectable
Date: 2005-08-26 22:56:16
Message-ID: 26436.1125096976@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Arjen van der Meijden <acmmailing(at)tweakers(dot)net> writes:
> As said, it chooses sequential scans or "the wrong index plans" over a
> perfectly good plan that is just not selected when the parameters are
> "too well tuned" or sequential scanning of the table is allowed.

I think some part of the problem comes from using inconsistent
datatypes. For instance, it seems very odd that the thing is not
using a hash or something to handle

t_0.Cat2 IN (SELECT 545 UNION SELECT ID FROM cat WHERE ParentID = 545)

seeing that it correctly guesses there are only going to be about 8 rows
in the union. Part of the reason is that cat2 is smallint, whereas the
output of the union must be at least int, maybe wider depending on the
datatype of cat.id (which you did not show us); so the comparison isn't
hashable. Even a smallint vs int comparison would be mergejoinable,
though, so I'm really wondering what cat.id is.

Another big part of the problem comes from poor result size estimation.
I'm not sure you can eliminate that entirely given the multiple
conditions on different columns (which'd require cross-column statistics
to really do well, which we do not have). But you could avoid
constructs like

WHERE ... t_1.recordtimestamp >=
(SELECT max_date - 60 FROM last_dates WHERE table_name = 'pricetracker')

The planner is basically going to throw up its hands and make a default
guess on the selectivity of this; it's not smart enough to decide that
the sub-select probably represents a constant. What I'd do with this
is to define a function marked STABLE for the sub-select result, perhaps
something like

create function get_last_date(tabname text, offsetdays int)
returns timestamp as $$
SELECT max_date - $2 FROM last_dates WHERE table_name = $1
$$ language sql strict stable;

(I'm guessing as to datatypes and the amount of parameterization you
need.) Then write the query like

WHERE ... t_1.recordtimestamp >= get_last_date('pricetracker', 60)

In this formulation the planner will be able to make a reasonable guess
about how many rows will match ... at least if your statistics are up
to date ...

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Philip Hallstrom 2005-08-26 23:13:56 Re: Weird performance drop after VACUUM
Previous Message asif ali 2005-08-26 22:52:24 Re: Weird performance drop after VACUUM