Re: Why is my view making my disk churn? (iostat)

From: David Johnston <polobo(at)yahoo(dot)com>
To: Wells Oliver <wellsoliver(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Why is my view making my disk churn? (iostat)
Date: 2012-09-12 03:18:49
Message-ID: 4C3A5783-E44D-4A40-9918-94853DD95415@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sep 11, 2012, at 22:41, Wells Oliver <wellsoliver(at)gmail(dot)com> wrote:

> I have this view which combines a few tables. It's wide-ish, 60 rows, almost all columns integer. It joins five tables, all using primary keys. The explain output is as follows:
>
> Nested Loop Left Join (cost=0.01..92.38 rows=4 width=222) (actual time=0.291..0.711 rows=4 loops=1)
> Join Filter: ((mytable.pid = info.pid) AND (mytable.year = info.year))
> -> Nested Loop Left Join (cost=0.01..81.93 rows=4 width=218) (actual time=0.110..0.221 rows=4 loops=1)
> Join Filter: ((mytable.pid = pos.pid) AND (mytable.year = pos.year))
> -> Nested Loop Left Join (cost=0.01..73.55 rows=4 width=214) (actual time=0.092..0.197 rows=4 loops=1)
> Join Filter: ((mytable.year = ipf.year) AND (mytable.pid = ipf.pid) AND ((mytable.sport_code)::text = (ipf.sport_code)::text))
> -> Nested Loop (cost=0.01..63.44 rows=4 width=202) (actual time=0.063..0.143 rows=4 loops=1)
> -> Index Scan using idx_persons_id on _persons player (cost=0.00..8.28 rows=1 width=23) (actual time=0.016..0.017 rows=1 loops=1)
> Index Cond: (mlbam_id = 461416)
> -> Nested Loop Left Join (cost=0.01..55.12 rows=4 width=183) (actual time=0.045..0.122 rows=4 loops=1)
> -> Index Scan using mytable.pkey on mytable. (cost=0.00..21.92 rows=4 width=172) (actual time=0.027..0.071 rows=4 loops=1)
> Index Cond: ((year = 2012) AND .pid = 461416) AND (gtype = 'R'::bpchar) AND (split = 'all'::text))
> -> Index Scan using teams_history_pkey on teams_history ts (cost=0.01..8.28 rows=1 width=20) (actual time=0.007..0.007 rows=0 loops=4)
> Index Cond: ((team_id = mytable.team_id) AND (year = textin(int4out(mytable.year))))
> -> Materialize (cost=0.00..10.03 rows=1 width=25) (actual time=0.006..0.009 rows=3 loops=4)
> -> Index Scan using ipf_pkey on ipf ipf (cost=0.00..10.03 rows=1 width=25) (actual time=0.019..0.027 rows=3 loops=1)
> Index Cond: ((year = 2012) AND .pid = 461416) AND ((factor_type)::text = 'run'::text) AND ((player_type)::text = 'hitter'::text))
> -> Materialize (cost=0.00..8.32 rows=1 width=12) (actual time=0.004..0.005 rows=1 loops=4)
> -> Index Scan using primary_pos_pkey1 on pos (cost=0.00..8.31 rows=1 width=12) (actual time=0.012..0.012 rows=1 loops=1)
> Index Cond: (.pid = 461416) AND (year = 2012))
> -> Materialize (cost=0.00..8.30 rows=1 width=12) (actual time=0.003..0.004 rows=1 loops=4)
> -> Index Scan using info_pkey on info (cost=0.00..8.30 rows=1 width=12) (actual time=0.011..0.013 rows=1 loops=1)
> Index Cond: (.pid = 461416) AND (year = 2012))
> Total runtime: 1.350 ms
>
> When I run this query, the disk thrashes. It's on a 4-disk RAID5 array. I call it for some 6,000 values of pid, like: SELECT * FROM myview WHERE pid = 1, then 2, etc. The iostat outputs shows util% getting close to 100% quickly:
>
> cciss/c0d1 0.00 0.00 152.00 0.00 2784.00 0.00 18.32 0.97 6.38 6.18 94.00
> cciss/c0d1 0.00 0.00 135.00 0.00 2688.00 0.00 19.91 0.92 6.96 6.22 84.00
> cciss/c0d1 0.00 0.00 131.00 0.00 2928.00 0.00 22.35 0.75 5.80 5.42 71.00
>
> It hits 100% easily if other things are going on.
>
> Is there anything I can do here? I suppose I could put my DB on a RAID0 array and that might help, but it seems drastic. This is not a high load environment and given that the view combines all of the tables via primary keys, I feel like this should be quicker. Is there some clue in the EXPLAIN output I am missing?
>
> The throttling of the disk causes other processes to queue up.
>

Not sure on the IO piece but...the query itself seems to run fast (1.350 ms), it's the "running it 6000 times" that is likely the issue.

You do not indicate how you are running the query/queries

You do not provide the query itself.

The plan seems sub-optimal (nested loop left join) but a plan without a query is difficult to comment on. It only returns 4 rows so this probably doesn't matter...and as I said above the query itself is fast.

In short the how you are running the query is the killer. You could try putting all your ".pid" values into a table (or array but that would be a large array) and fire off the query once.

Another option is to at least "PREPARE" the query so it is only planned once then that one plan can be executed 6000 times.

David J.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Edson Richter 2012-09-12 03:37:18 Re: Compressed binary field
Previous Message Tom Lane 2012-09-12 03:00:17 Re: Bad pg_dump error message