Query performance issue

From: "Jonathan Gray" <jgray(at)streamy(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Query performance issue
Date: 2007-07-24 07:48:07
Message-ID: 0f1701c7cdc6$f9cc1a30$ed644e90$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We're experiencing a query performance problem related to the planner and
its ability to perform a specific type of merge.

We have created a test case (as attached, or here:
http://www3.streamy.com/postgres/indextest.sql) which involves a
hypothetical customer ordering system, with customers, orders, and customer
groups.

If we want to retrieve a single customers 10 most recent orders, sorted by
date, we can use a double index on (customer,date); Postgres's query planner
will use the double index with a backwards index scan on the second indexed
column (date).

However, if we want to retrieve a "customer class's" 10 most recent orders,
sorted by date, we are not able to get Postgres to use double indexes.

We have come to the conclusion that the fastest way to accomplish this type
of query is to merge, in sorted order, each customers set of orders (for
which we can use the double index). Using a heap to merge these ordered
lists (until we reach the limit) seems the most algorithmically efficient
way we are able to find. This is implemented in the attachment as a
pl/pythonu function.

Another less algorithmically efficient solution, but faster in practice for
many cases, is to fetch the full limit of orders from each customer, sort
these by date, and return up to the limit.

We are no masters of reading query plans, but for straight SQL queries the
planner seems to yield two different types of plan. They are fast in
certain cases but breakdown in our typical use cases, where the number of
orders per customer is sparse compared to the total number of orders across
the date range.

We are interested in whether a mechanism internal to Postgres can accomplish
this type of merging of indexed columns in sorted order.

If this cannot currently be accomplished (or if there is something we are
missing about why it shouldn't be) we would appreciate any pointers to be
able to translate our python heap approach into C functions integrated more
closely with Postgres. The python function incurs large constant costs
because of type conversions and repeated queries to the database.

Thanks for any help or direction.

Jonathan Gray / Miguel Simon

Attachment Content-Type Size
indextest.sql application/octet-stream 20.8 KB

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Chris 2007-07-24 08:44:04 Re: Query performance issue
Previous Message Valentine Gogichashvili 2007-07-24 07:40:43 Re: multicolumn index column order