Re: How to enhance the chance that data is in disk cache

From: Jona <jonanews(at)oismail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: How to enhance the chance that data is in disk cache
Date: 2005-06-13 17:10:32
Message-ID: 42ADBE08.10700@oismail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thank you for the response Tom, I bet you get a lot of mails with
"trivial" solutions (mine likely being one of them)
I for one however truly appreciate you taking the time to answer them.

>Run the query more often?
>
>
The query is dynamically constructed from user input, although the total
number of different queries that can be run is limited (around 10k
different combinations I suspect) it seems rather pointless to run all
of them (or even the most common) more often just to keep the data in
the disk cache.
Is there a way to make the data more accessible on the disk?

>Also, that pile of INNER JOINs is forcing a probably-bad join order;
>you need to think carefully about the order you want things joined in,
>or else convert the query to non-JOIN syntax. See the "Performance
>Tips" chapter of the manual.
>
>
You're probably right here, the join order must be bad though it just
flattening the join and letting the planner decide on what would be best
makes the plan change for every execution.
Have query cost variering from from 1350 to 4500.
I wager it ends up using GEQO due to the number of possiblities for a
join order that the query has and thus just decides on a "good" plan out
of those it examined.
In any case, the "right" way to do this is definning a good explicit
join order, no?
On top of my head I'm not sure how to re-write it proberly, suppose
trial and errors is the only way....
From the plan it appears that the following part is where the cost
dramatically increases (although the time does not??):
-> Nested Loop (cost=0.00..1207.19 rows=75 width=32) (actual
time=0.28..18.47 rows=164 loops=1)
-> Nested Loop (cost=0.00..868.23 rows=58 width=20) (actual
time=0.16..13.91 rows=164 loops=1)
-> Index Scan using subcat_uq on sct2subcattype_tbl
(cost=0.00..479.90 rows=82 width=8) (actual time=0.11..9.47 rows=164
loops=1)
Index Cond: (subcattpid = 50)
Filter: (NOT (subplan))
SubPlan
-> Seq Scan on aff2sct2subcattype_tbl (cost=0.00..1.92
rows=1 width=4) (actual time=0.05..0.05 rows=0 loops=164)
Filter: ((affid = 8) AND ($0 = sctid))
-> Index Scan using aff_price_uq on price_tbl (cost=0.00..4.72
rows=1 width=12) (actual time=0.02..0.02 rows=1 loops=164)
Index Cond: ((price_tbl.affid = 8) AND (price_tbl.sctid =
outer".sctid))"
-> Index Scan using ctp_statcon on statcon_tbl (cost=0.00..5.86
rows=1 width=12) (actual time=0.02..0.02 rows=1 loops=164)
Index Cond: ((statcon_tbl.sctid = outer".sctid) AND
(statcon_tbl.ctpid = 1))"
Especially the index scan on subcat_uq seems rather expensive, but is
pretty fast.
Can there be drawn a relation between estimated cost and execution time?
Any other pointers in the right direction would be very much appreciated.

For the full query and query plan, please refer to:
http://213.173.234.215:8080/get_content_plan.htm

Cheers
Jona

Tom Lane wrote:

>Jona <jonanews(at)oismail(dot)com> writes:
>
>
>>I have a query (please refer to
>>http://213.173.234.215:8080/get_content_plan.htm for the query as well
>>as query plan) that is slow when it's run the first time and fast(ish)
>>on all successive runs within a reasonable time period.
>>
>>
>
>
>
>>This leads me to suspect that when the query is first run, all used data
>>have to be fetched from the disk where as once it has been run all data
>>is available in the OS's disk cache.
>>
>>
>
>Sounds like that to me too.
>
>
>
>>Is there anway to either enhance the chance that the data can be found
>>in the disk cache or allowing the database to fetch the data faster?
>>
>>
>
>
>

>Run the query more often?
>
>
The query is dynamically constructed from user input, although the total
number of different queries that can be run is limited (around 10k
different combinations I suspect) it seems rather pointless to run all
of them (or even the most common) more often just to keep the data in
the disk cache.
Is there a way to make the data more accessible on the disk?

>Also, that pile of INNER JOINs is forcing a probably-bad join order;
>you need to think carefully about the order you want things joined in,
>or else convert the query to non-JOIN syntax. See the "Performance
>Tips" chapter of the manual.
>
>
You're probably right herem though I'm not sure I can

> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Yves Vindevogel 2005-06-13 17:22:07 Re: Updates on large tables are extremely slow
Previous Message Bruno Wolff III 2005-06-13 16:53:55 Re: Index ot being used