Re: Help with tuning this query

From: John Arbash Meinel <john(at)arbash-meinel(dot)com>
To: Ken Egervari <ken(at)upfactor(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Help with tuning this query
Date: 2005-03-02 17:38:24
Message-ID: 4225FA10.6070302@arbash-meinel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers-win32 pgsql-performance

Ken Egervari wrote:

>> First, what version of postgres, and have you run VACUUM ANALYZE
>> recently?
>> Also, please attach the result of running EXPLAIN ANALYZE.
>> (eg, explain analyze select s.* from shipment ...)
>
>
> I'm using postgres 8.0. I wish I could paste explain analyze, but I
> won't be at work for a few days. I was hoping some Postgres/SQL
> experts here would be able to simply look at the query and make
> recommendations because it's not a very difficult or unique query.
>
That's the problem. Without explain analyze, it's hard to say why it is
performing weird, because it *does* look like a straightforward query.

>> It's very possible that you don't have up-to-date statistics, which
>> causes postgres to make a bad estimate of what the fastest plan is.
>
>
> I run VACUUM ANALYZE religiously. I even dumped the production
> database and used it as my test database after a full vacuum analyze.
> It's really as fresh as it can be.
>
Good. Again, this is just the first precaution, as not everyone is as
careful as you. And without the explain analyze, you can't tell what the
planner estimates are.

>> I don't know how to make Hibernate do what you want, but if you change
>> the query to using subselects (not all databases support this, so
>> hibernate might not let you), you can see a performance improvement.
>
>
> Yes, Hibernate supports sub-selects. In fact, I can even drop down to
> JDBC explicitly, so whatever SQL tricks out there I can use will work
> on Hibernate. In what way will sub-selects improve this query?
>
When doing massive joins across multiple tables (as you are doing) it is
frequently faster to do a couple of small joins where you only need a
couple of rows as input to the rest. Something like:

select * from shipment s
where s.carrier_code_id in
(select cc.id from carrier_code cc join carrier c on
cc.carrier_id = c.id)
and s.current_status_id in (select cs.id from shipment_status cs where ...)

Again it's something that you can try. I have found quite a few of my
queries performed much better with subselects.
I'm guessing it's because with big queries it has a harder time figuring
out how to refactor (the decision tree becomes big). But I'm not really
sure. I just know it can work.

>> Also sometimes using explicit joins can be worse than just letting the
>> query manager figure it out. So something like
>> select s.* from shipment s, carrier_code cc, carrier c, ...
>> where s.carrier_code_id = cc.id and c.id = cc.carrier_id and ....
>
>
> I think I can avoid using joins in Hibernate, but it makes the query
> harder to maintain. How much of a performance benefit are we talking
> with this change? Since hibernate is an object language, you don't
> actually have to specify many joins. You can use the "dot" notation.
>
I'm not saying this *will* improve performance. It is just something to
try. It very easily could not be worth the overhead.

> Query query = session.createQuery(
> "select shipment " +
> "from Shipment shipment " +
> " inner join
> shipment.cargoControlNumber.carrierCode.carrier.persons person " +
> " inner join shipment.currentStatus currentStatus " +
> " inner join currentStatus.releaseCode releaseCode " +
> " left join fetch shipment.currentStatus " +
> "where " +
> " person.id = :personId and " +
> " shipment.isPurged = false and " +
> " releaseCode.number = '9' and " +
> " currentStatus is not null and " +
> " currentStatus.date >= current_date - 31 " +
> "order by currentStatus.date desc"
> );
>
> query.setParameter( "personId", personId );
>
> query.setFirstResult( firstResult );
> query.setMaxResults( maxResults );
>
> return query.list();
>
> As you can see, it's fairly elegant language and maps to SQL quite well.
>
>> But again, since this is generated from another program (Hibernate), I
>> really don't know how you tell it how to tune the SQL. Probably the
>> biggest "non-bug" performance improvements are from tuning the SQL.
>
>
> I agree, but the ones I've tried aren't good enough. I have made
> these indexes that apply to this query as well as others in my from
> looking at my SQL scripts. Many of my queries have really sped up to
> 14 milliseconds from these indexes. But I can't make this query run
> any faster.
>
> CREATE INDEX carrier_to_person_person_id_idx ON carrier_to_person
> USING btree (person_id);
> CREATE INDEX carrier_to_person_carrier_id_idx ON carrier_to_person
> USING btree (carrier_id);
> CREATE INDEX carrier_code_carrier_id_idx ON carrier_code USING btree
> (carrier_id);
> CREATE INDEX shipment_carrier_code_id_idx ON shipment USING btree
> (carrier_code_id);
> CREATE INDEX current_status_date_idx ON shipment_status USING btree
> (date);
> CREATE INDEX shipment_current_status_id_idx ON shipment USING btree
> (current_status_id);
> CREATE INDEX shipment_status_shipment_id_idx ON shipment_status USING
> btree (shipment_id);
>
> Thanks for your responses everyone. I'll try and get you that explain
> analyze. I'm just not at work at the moment but this is a problem
> that I'm simply puzzled and worried about. I'm getting all of this
> from CVS on my work server.
>
> Ken

There is also the possibility that you are having problems with
cross-column correlation, or poor distribution of a column. Postgres
doesn't keep cross-column statistics, so if 2 columns are correlated,
then it mis-estimates selectivity, and might pick the wrong plan.

In general your query looks decent, we just need to figure out what is
going on.

John
=:->

In response to

Browse pgsql-hackers-win32 by date

  From Date Subject
Next Message massimo.zanforlin 2005-03-02 17:58:36 Problem during postgresql-8.0.msi installation
Previous Message Ken Egervari 2005-03-02 17:23:23 Re: Help with tuning this query

Browse pgsql-performance by date

  From Date Subject
Next Message Shawn Chisholm 2005-03-02 17:52:10 Performance tradeoff
Previous Message Ken Egervari 2005-03-02 17:23:23 Re: Help with tuning this query