Re: Assistance with Query Optimisation?

From: "Isak Hansen" <isak(dot)hansen(at)gmail(dot)com>
To: "Shaun Johnston" <shaunj(at)benon(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Assistance with Query Optimisation?
Date: 2007-03-01 12:52:58
Message-ID: 6b9e1eb20703010452m6fa97914he629df06e01a5601@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 3/1/07, Shaun Johnston <shaunj(at)benon(dot)com> wrote:
> Hi
>
> Apologies in advance for the verbosity of my explanation for this
> problem, but I think it's all pertinent.
>
> I have a fairly simple query which postgresql's query planner seems to
> be interpreting / optimising in interesting ways:
>
> Query:
>
> SELECT *
> FROM account_transaction
> WHERE account_id = 10783
> ORDER BY account_transaction_timestamp asc
> OFFSET 0
> LIMIT 100
>
> Table "public.account_transaction"
>
> (I have stripped the default values etc for the sake of formatting)
>
> Column | Type |
> -------------------------------+--------------------------+
> account_transaction_id | bigint |
> account_transaction_timestamp | timestamp with time zone |
> account_id | integer |
> other_account_transaction_id | bigint |
> transaction_reason | text |
> transaction_amount | numeric(15,2) |
> transaction_exchange_rate | numeric(20,10) |
> transaction_base_amount | numeric(15,2) |
> transaction_locked_until | timestamp with time zone |
> transaction_approved | boolean |
> Indexes:
> "account_transaction_pkey" PRIMARY KEY, btree (account_transaction_id),
> tablespace "indexes"
> "account_transaction_ai" btree (account_id), tablespace "indexes"
> "account_transaction_timestamp" btree (account_transaction_timestamp),
> tablespace "indexes"
> "account_transaction_tlu" btree (transaction_locked_until),
> tablespace "indexes"
> Foreign-key constraints:
> "$1" FOREIGN KEY (account_id) REFERENCES
> account(account_id)
> "$2" FOREIGN KEY (other_account_transaction_id) REFERENCES
> account_transaction(account_transaction_id) ON UPDATE CASCADE
>
> Query Plans:
>
> With Sort and Limit
> ---------------------------------------------------------------------------------
> QUERY
> PLAN
> ---------------------------------------------------------------------------------
> Limit (cost=0.00..12297.59 rows=100 width=120)
> (actual time=23.537..275476.496 rows=100 loops=1)
> -> Index Scan Backward using account_transaction_timestamp on
> account_transaction
> (cost=0.00..640704.23 rows=5210 width=120)
> (actual time=23.529..275475.781 rows=100 loops=1)
> Filter: (account_id = 10783)
> Total runtime: 275476.944 ms
>
>
> With Sort but no Limit
> --------------------------------------------------------------------------------
> QUERY
> PLAN
> --------------------------------------------------------------------------------
> Sort (cost=18777.41..18790.43 rows=5210 width=120)
> (actual time=1081.226..1082.170 rows=308 loops=1)
> Sort Key: account_transaction_timestamp
> -> Index Scan using account_transaction_ai on account_transaction
> (cost=0.00..18455.77 rows=5210 width=120)
> (actual time=47.731..1070.788 rows=308 loops=1)
> Index Cond: (account_id = 10783)
> Total runtime: 1083.182 ms
>
> With Limit but no Sort
> ---------------------------------------------------------------------------------
> QUERY
> PLAN
> ---------------------------------------------------------------------------------
> Limit (cost=0.00..354.24 rows=100 width=120)
> (actual time=0.029..1.070 rows=100 loops=1)
> -> Index Scan using account_transaction_ai on account_transaction
> (cost=0.00..18455.77 rows=5210 width=120)
> (actual time=0.022..0.467 rows=100 loops=1)
> Index Cond: (account_id = 10783)
> Total runtime: 1.422 ms
>
> With Limit and Sort, but sorted by transaction_base_amount
> ---------------------------------------------------------------------------------
> QUERY
> PLAN
> ---------------------------------------------------------------------------------
> Limit (cost=18777.41..18777.66 rows=100 width=120)
> (actual time=55.294..56.221 rows=100 loops=1)
> -> Sort (cost=18777.41..18790.43 rows=5210 width=120)
> (actual time=55.285..55.600 rows=100 loops=1)
> Sort Key: transaction_base_amount
> -> Index Scan using account_transaction_ai on
> account_transaction
> (cost=0.00..18455.77 rows=5210 width=120)
> (actual time=0.057..53.187 rows=308 loops=1)
> Index Cond: (account_id = 10783)
> Total runtime: 56.597 ms
>
> The table currently contains about 10 million records.
>
> If I sort by account_transaction_timestamp then limit, the planner
> performs a backward index scan based on account_transaction_timestamp
> then limits.
>
> If I sort but don't limit, it performs a forward scan on the table, then
> a sort on the results.
>
> If I limit but don't sort, it performs the forward scan then limits.
>
> If I limit and sort, but sort by transaction_base_amount instead, it
> performs a forward index scan, then sorts the results, then limits - in
> stark contrast to sorting and limiting using account_transaction_timestamp.
>
> As shown, there is a vast difference in efficiency between sorting and
> limiting by account_transaction_timestamp, and doing the same using
> transaction_base_amount (in favour of transaction_base_amount by a
> factor of about 4860). This seems to be due to the way the planner is
> optimising the query.
>
> Could this optimisation peculiarity be due to the combination of indexed
> columns in the query? Also, Is there a way I can 'force' the planner to
> perform an 'index scan -> sort -> limit' or even better an 'index scan
> -> limit -> sort'?
>
> Any pointers / assistance appreciated.
>

Starting off with a disclaimer, just to get that out of the way; I'm a
programmer; no DBA, have limited experience with Postgres, and my
terminology is probably way off.

First thing I notice is that your query plans seem to only use one index.

Postgres should be able to combine the timestamp and account_id
indexes in that first query, if the optimizer thought there was a
point in doing so?

Did you analyze the db recently? The optimizer won't perform too well
with incorrect statistics.

If that doesn't help, you could perhaps add account_id to your timestamp index.

I have no idea how far the postgres planner goes in restructuring your
queries, but using explicit subqueries, correlated vs. non-correlated,
may give it a good hint about what you're trying to do.

HTH,
Isak

> Kind Regards,
>
> Shaun Johnston
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kenneth Downs 2007-03-01 13:13:05 Re: usage for 'with recursive'?
Previous Message Magnus Hagander 2007-03-01 11:57:34 Re: Please Help ! Error with Access 2003