Re: extremly low memory usage

From: Jeremiah Jahn <jeremiah(at)cs(dot)earlham(dot)edu>
To: John Arbash Meinel <john(at)arbash-meinel(dot)com>
Cc: Jeff Trout <threshar(at)torgo(dot)978(dot)org>, postgres performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: extremly low memory usage
Date: 2005-08-19 16:48:29
Message-ID: 1124470109.27881.152.camel@bluejay.goodinassociates.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Sorry about the formatting.

On Thu, 2005-08-18 at 12:55 -0500, John Arbash Meinel wrote:
> Jeremiah Jahn wrote:
>
> >here's an example standard query. Ireally have to make the first hit go
> >faster. The table is clustered as well on full_name as well. 'Smith%'
> >took 87 seconds on the first hit. I wonder if I set up may array wrong.
> >I remeber see something about DMA access versus something else, and
> >choose DMA access. LVM maybe?
> >
> >
> It would be nice if you would format your queries to be a little bit
> easier to read before posting them.
> However, I believe I am reading it correctly, to say that the index scan
> on identity is not your slow point. In fact, as near as I can tell, it
> only takes 52ms to complete.
>
> The expensive parts are the 4915 lookups into the litigant_details (each
> one takes approx 4ms for a total of ~20s).
> And then you do it again on case_data (average 3ms each * 4906 loops =
> ~15s).
Is there some way to avoid this?

>
> So there is no need for preloading your indexes on the identity table.
> It is definitely not the bottleneck.
>
> So a few design bits, which may help your database.
> Why is "actor_id" a text field instead of a number?
This is simply due to the nature of the data.

> You could try creating an index on "litigant_details (actor_id,
> count_ori)" so that it can do just an index lookup, rather than an index
> + filter.
I have one, but it doesn't seem to like to use it. Don't really need it
though, I can just drop the court_id out of the query. It's redundant,
since each actor_id is also unique in litigant details. I had run vac
full and analyze but I ran them again anyway and the planning improved.
However, my 14 disk raid 10 array is still slower than my 3 disk raid 5
on my production box. 46sec vs 30sec (with live traffic on the
production) One of the strange things is that when I run the cat command
on my index and tables that are "HOT" it has no effect on memory usage.
Right now I'm running ext3 on LVM. I'm still in a position to redo the
file system and everything. Is this a good way to do it or should I
switch to something else? What about stripe and extent sizes...? kernel
parameters to change?

---------------devel box:-----------------------

copa=# EXPLAIN ANALYZE select full_name,identity_id,identity.case_id,court.id,date_of_birth,assigned_case_role,litigant_details.impound_litigant_data
copa-# from identity
copa-# join litigant_details on identity.actor_id = litigant_details.actor_id
copa-# join case_data on litigant_details.case_id = case_data.case_id and litigant_details.court_ori = case_data.court_ori
copa-# join court on identity.court_ori = court.id
copa-# where identity.court_ori = 'IL081025J' and full_name like 'JONES%' order by full_name;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=3.29..29482.22 rows=3930 width=86) (actual time=114.060..46001.480 rows=5052 loops=1)
-> Nested Loop (cost=3.29..16193.27 rows=3820 width=112) (actual time=93.038..24584.275 rows=5052 loops=1)
-> Nested Loop (cost=0.00..16113.58 rows=3820 width=113) (actual time=85.778..24536.489 rows=5052 loops=1)
-> Index Scan using name_speed on identity (cost=0.00..824.72 rows=3849 width=82) (actual time=50.284..150.133 rows=5057 loops=1)
Index Cond: (((full_name)::text >= 'JONES'::character varying) AND ((full_name)::text < 'JONET'::character varying))
Filter: (((court_ori)::text = 'IL081025J'::text) AND ((full_name)::text ~~ 'JONES%'::text))
-> Index Scan using lit_actor_speed on litigant_details (cost=0.00..3.96 rows=1 width=81) (actual time=4.788..4.812 rows=1 loops=5057)
Index Cond: (("outer".actor_id)::text = (litigant_details.actor_id)::text)
-> Materialize (cost=3.29..3.30 rows=1 width=12) (actual time=0.002..0.003 rows=1 loops=5052)
-> Seq Scan on court (cost=0.00..3.29 rows=1 width=12) (actual time=7.248..7.257 rows=1 loops=1)
Filter: ('IL081025J'::text = (id)::text)
-> Index Scan using case_speed on case_data (cost=0.00..3.46 rows=1 width=26) (actual time=4.222..4.230 rows=1 loops=5052)
Index Cond: ((("outer".court_ori)::text = (case_data.court_ori)::text) AND (("outer".case_id)::text = (case_data.case_id)::text))
Total runtime: 46005.994 ms

>
> More importantly, though, the planner seems to think the join of
> identity to litigant_details will only return 1 row, not 5000.
> Do you regularly vacuum analyze your tables?
> Just as a test, try running:
> set enable_nested_loop to off;
not quite acceptable
Total runtime: 221486.149 ms

> And then run EXPLAIN ANALYZE again, just to see if it is faster.
>
> You probably need to increase some statistics targets, so that the
> planner can design better plans.

---------------------this is the output from the production box------------------
LOG: duration: 27213.068 ms statement: EXPLAIN ANALYZE select full_name,identity_id,identity.case_id,court.id,date_of_birth,assigned_case_role,litigant_details.impound_litigant_data
from identity
join litigant_details on identity.actor_id = litigant_details.actor_id
join case_data on litigant_details.case_id = case_data.case_id and litigant_details.court_ori = case_data.court_ori
join court on identity.court_ori = court.id
where identity.court_ori = 'IL081025J' and full_name like 'JONES%' order by full_name;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=3.29..43498.76 rows=2648 width=86) (actual time=17.106..27192.000 rows=5052 loops=1)
-> Nested Loop (cost=0.00..43442.53 rows=2647 width=87) (actual time=16.947..27120.619 rows=5052 loops=1)
-> Nested Loop (cost=0.00..23061.79 rows=3827 width=113) (actual time=16.801..17390.682 rows=5052 loops=1)
-> Index Scan using name_speed on identity (cost=0.00..1277.39 rows=3858 width=82) (actual time=9.842..213.424 rows=5057 loops=1)
Index Cond: (((full_name)::text >= 'JONES'::character varying) AND ((full_name)::text < 'JONET'::character varying))
Filter: (((court_ori)::text = 'IL081025J'::text) AND ((full_name)::text ~~ 'JONES%'::text))
-> Index Scan using lit_actor_speed on litigant_details (cost=0.00..5.63 rows=1 width=81) (actual time=3.355..3.364 rows=1 loops=5057)
Index Cond: (("outer".actor_id)::text = (litigant_details.actor_id)::text)
-> Index Scan using case_data_pkey on case_data (cost=0.00..5.31 rows=1 width=26) (actual time=1.897..1.904 rows=1 loops=5052)
Index Cond: ((("outer".court_ori)::text = (case_data.court_ori)::text) AND (("outer".case_id)::text = (case_data.case_id)::text))
-> Materialize (cost=3.29..3.30 rows=1 width=12) (actual time=0.002..0.003 rows=1 loops=5052)
-> Seq Scan on court (cost=0.00..3.29 rows=1 width=12) (actual time=0.142..0.165 rows=1 loops=1)
Filter: ('IL081025J'::text = (id)::text)
Total runtime: 27205.060 ms

>
>
> John
> =:->
>
--
"I didn't know it was impossible when I did it."

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message J. Andrew Rogers 2005-08-19 17:12:42 Re: sustained update load of 1-2k/sec
Previous Message Jeffrey W. Baker 2005-08-19 16:34:44 Re: sustained update load of 1-2k/sec