Re: extremly low memory usage

From: Ron <rjpeace(at)earthlink(dot)net>
To: Jeremiah Jahn <jeremiah(at)cs(dot)earlham(dot)edu>
Cc: postgres performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: extremly low memory usage
Date: 2005-08-18 19:56:53
Message-ID: 6.2.3.4.0.20050818141059.05c2fac0@pop.earthlink.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

At 01:55 PM 8/18/2005, John Arbash Meinel wrote:
>Jeremiah Jahn wrote:
>
> >here's an example standard query. Ireally have to make the first hit go
> >faster. The table is clustered as well on full_name as well. 'Smith%'
> >took 87 seconds on the first hit. I wonder if I set up may array wrong.
> >I remeber see something about DMA access versus something else, and
> >choose DMA access. LVM maybe?
> >
> >
>It would be nice if you would format your queries to be a little bit
>easier to read before posting them.
>However, I believe I am reading it correctly, to say that the index scan
>on identity is not your slow point. In fact, as near as I can tell, it
>only takes 52ms to complete.
>
>The expensive parts are the 4915 lookups into the litigant_details (each
>one takes approx 4ms for a total of ~20s).
>And then you do it again on case_data (average 3ms each * 4906 loops =
>~15s).

How big are litigant_details and case_data? If they can fit in RAM,
preload them using methods like the "cat to /dev/null" trick and
those table lookups will be ~100-1000x faster. If they won't fit
into RAM but the machine can be expanded to hold enough RAM to fit
the tables, it's well worth the ~$75-$150/GB to upgrade the server so
that the tables will fit into RAM.

If they can't be made to fit into RAM as atomic entities, you have a
few choices:
A= Put the data tables and indexes on separate dedicated spindles and
put litigant_details and case_data each on their own dedicated
spindles. This will lower seek conflicts. Again it this requires
buying some more HDs, it's well worth it.

B= Break litigant_details and case_data into a set of smaller tables
(based on something sane like the first n characters of the primary key)
such that the smaller tables easily fit into RAM. Given that you've
said only 10GB/60GB is "hot", this could work very well. Combine it
with "A" above (put all the litigant_details sub tables on one
dedicated spindle set and all the case_data sub tables on another
spindle set) for added oomph.

C= Buy a SSD big enough to hold litigant_details and case_data and
put them there. Again, this can be combined with "A" and "B" above
to lessen the size of the SSD needed.

>So there is no need for preloading your indexes on the identity
>table. It is definitely not the bottleneck.
>
>So a few design bits, which may help your database. Why is
>"actor_id" a text field instead of a number?
>You could try creating an index on "litigant_details (actor_id,
>count_ori)" so that it can do just an index lookup, rather than an
>index+ filter.

Yes, that certainly sounds like it would be more efficient.

>More importantly, though, the planner seems to think the join of
>identity to litigant_details will only return 1 row, not 5000.
>Do you regularly vacuum analyze your tables?
>Just as a test, try running:
>set enable_nested_loop to off;
>And then run EXPLAIN ANALYZE again, just to see if it is faster.
>
>You probably need to increase some statistics targets, so that the
>planner can design better plans.
>
> > -> Nested Loop (cost=0.00..20411.83 rows=2 width=173) (actual
> time=12.891..38317.017 rows=4906 loops=1)
> > -> Nested Loop (cost=0.00..20406.48 rows=1 width=159)(actual
> time=12.826..23232.106 rows=4906 loops=1)
> > -> Nested Loop (cost=0.00..20403.18 rows=1 width=138)
> (actual time=12.751..22885.439 rows=4906 loops=1)
> > Join Filter: (("outer".case_id)::text =
> ("inner".case_id)::text)
> > -> Index Scan using name_speed on
> identity (cost=0.00..1042.34 rows=4868 width=82) (actual time=0.142..52.538
> > rows=4915 loops=1)
> > Index Cond: (((full_name)::text >=
> 'MILLER'::character varying) AND ((full_name)::text <
> 'MILLES'::character varying))
> > Filter: (((court_ori)::text =
> 'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text))
> > -> Index Scan using lit_actor_speed on
> litigant_details (cost=0.00..3.96 rows=1 width=81) (actual
> > time=4.631..4.635 rows=1 loops=4915)
> > Index Cond: (("outer".actor_id)::text =
> (litigant_details.actor_id)::text)
> > Filter: ('IL081025J'::text = (court_ori)::text)
> > -> Seq Scan on court (cost=0.00..3.29
> rows=1 width=33) (actual time=0.053..0.062 rows=1 loops=4906)
> > Filter: ('IL081025J'::text = (id)::text)
> > -> Index Scan using case_speed on
> case_data (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058
> > rows=1 loops=4906)
> > Index Cond: (('IL081025J'::text
> = (case_data.court_ori)::text) AND ((case_data.case_id)::text =
> > ("outer".case_id)::text))

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message David Hodgkinson 2005-08-18 22:08:29 Re: FW: Tx forecast improving harware capabilities.
Previous Message Merlin Moncure 2005-08-18 19:22:46 Re: limit number of concurrent callers to a stored proc?