Re: Query performance issue

From: "Jonathan Gray" <jgray(at)streamy(dot)com>
To: "'Chris'" <dmagick(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query performance issue
Date: 2007-07-24 09:18:53
Message-ID: 0f3001c7cdd3$a7db7740$f79265c0$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Chris,

Creating indexes on the customerclass table does speed up the queries but
still does not create the plan we are looking for (using the double index
with a backward index scan on the orders table).

The plans we now get, with times on par or slightly better than with the
plpgsql hack, are:

EXPLAIN ANALYZE
SELECT o.orderid,o.orderstamp FROM indextest.orders o
INNER JOIN indextest.customerclass cc ON (cc.classid = 2)
WHERE o.customerid = cc.customerid ORDER BY o.orderstamp DESC LIMIT 5;


QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------
Limit (cost=0.00..176.65 rows=5 width=12) (actual time=0.930..3.675 rows=5
loops=1)
-> Nested Loop (cost=0.00..46388.80 rows=1313 width=12) (actual
time=0.927..3.664 rows=5 loops=1)
-> Index Scan Backward using orders_orderstamp_idx on orders o
(cost=0.00..6225.26 rows=141001 width=16) (actual time=0.015..0.957 rows=433
loops=1)
-> Index Scan using customerclass_customerid_idx on customerclass
cc (cost=0.00..0.27 rows=1 width=4) (actual time=0.004..0.004 rows=0
loops=433)
Index Cond: (o.customerid = cc.customerid)
Filter: (classid = 2)

And

EXPLAIN ANALYZE
SELECT o.orderid,o.orderstamp FROM indextest.orders o
INNER JOIN indextest.customerclass cc ON (cc.classid = 2)
WHERE o.customerid = cc.customerid ORDER BY o.orderstamp DESC LIMIT 100;

QUERY
PLAN
----------------------------------------------------------------------------
---------------------------------------------------------------------------
Limit (cost=1978.80..1979.05 rows=100 width=12) (actual time=6.167..6.448
rows=100 loops=1)
-> Sort (cost=1978.80..1982.09 rows=1313 width=12) (actual
time=6.165..6.268 rows=100 loops=1)
Sort Key: o.orderstamp
-> Nested Loop (cost=3.99..1910.80 rows=1313 width=12) (actual
time=0.059..4.576 rows=939 loops=1)
-> Bitmap Heap Scan on customerclass cc (cost=3.99..55.16
rows=95 width=4) (actual time=0.045..0.194 rows=95 loops=1)
Recheck Cond: (classid = 2)
-> Bitmap Index Scan on customerclass_classid_idx
(cost=0.00..3.96 rows=95 width=0) (actual time=0.032..0.032 rows=95 loops=1)
Index Cond: (classid = 2)
-> Index Scan using orders_customerid_idx on orders o
(cost=0.00..19.35 rows=15 width=16) (actual time=0.006..0.025 rows=10
loops=95)
Index Cond: (o.customerid = cc.customerid)

As I said, this is a hypothetical test case we have arrived at that
describes our situation as best as we can given a simple case. We're
interested in potential issues with the approach, why postgres would not
attempt something like it, and how we might go about implementing it
ourselves at a lower level than we currently have (in SPI, libpq, etc).

If it could be generalized then we could use it in cases where we aren't
pulling from just one table (the orders table) but rather trying to merge,
in sorted order, results from different conditions on different tables.
Right now we use something like the plpgsql or plpythonu functions in the
example and they outperform our regular SQL queries by a fairly significant
margin.

An example might be:

SELECT * FROM (
(SELECT orderid,stamp FROM indextest.orders_usa WHERE customerid = <setof
customerids> ORDER BY stamp DESC LIMIT 5) UNION
(SELECT orderid,stamp FROM indextest.orders_can WHERE customerid = <setoff
customerids> ORDER BY stamp DESC LIMIT 5)
) as x ORDER BY x.stamp DESC

Again, that's a general example but some of my queries contain between 5 and
10 different sorted joins of this kind and it would be helpful to have
something internal in postgres to efficiently handle it (do something just
like the above query but not have to do the full LIMIT 5 for each set, some
kind of in order merge/heap join?)

Jonathan Gray

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Chris
Sent: Tuesday, July 24, 2007 1:51 AM
To: Jonathan Gray
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Query performance issue

Chris wrote:
> Jonathan Gray wrote:
>> We're experiencing a query performance problem related to the planner
>> and its ability to perform a specific type of merge.
>>
>>
>>
>> We have created a test case (as attached, or here:
>> http://www3.streamy.com/postgres/indextest.sql) which involves a
>> hypothetical customer ordering system, with customers, orders, and
>> customer groups.
>>
>>
>>
>> If we want to retrieve a single customers 10 most recent orders,
>> sorted by date, we can use a double index on (customer,date);
>> Postgres's query planner will use the double index with a backwards
>> index scan on the second indexed column (date).
>>
>>
>>
>> However, if we want to retrieve a "customer class's" 10 most recent
>> orders, sorted by date, we are not able to get Postgres to use double
>> indexes.
>
> You don't have any indexes on the 'customerclass' table.
>
> Creating a foreign key doesn't create an index, you need to do that
> separately.
>
> Try
>
> create index cc_customerid_class on indextest.customerclass(classid,
> customerid);
>

It could also be that since you don't have very much data (10,000) rows
- postgres is ignoring the indexes because it'll be quicker to scan the
tables.

If you bump it up to say 100k rows, what happens?

--
Postgresql & php tutorials
http://www.designmagick.com/

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Chris 2007-07-24 09:36:25 Re: Query performance issue
Previous Message Chris 2007-07-24 08:50:55 Re: Query performance issue