Skip site navigation (1) Skip section navigation (2)

Re: selects from large tables

From: Nikk Anderson <Nikk(dot)Anderson(at)parallel(dot)ltd(dot)uk>
To: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: selects from large tables
Date: 2002-11-18 16:36:44
Message-ID: DA1274E682D3734B8802904A9B36124C298A9A@nic-nts1.nic.parallel.ltd.uk (view raw or flat)
Thread:
Lists: pgsql-performance
Hi Tom, 

Yes, we should upgrade to 7.2 soon, its just that as it is a live system
running 24x7 we are careful about upgrading core components so we do not
disrupt our data collection agents too much.

Here is some table info, we currently index by time then ID.  Generally,
data will be selected by ID, then time range.  Clustering may help on this.



    Attribute    |           Type           | Modifier
-----------------+--------------------------+----------
 job_id          | integer                  | not null
 server_id       | integer                  | not null
 time            | timestamp with time zone | not null
 availability    | boolean                  |
 connection_time | integer                  |
 dns_setup       | integer                  |
 server_response | integer                  |
 frontpage_size  | integer                  |
 frontpage_time  | integer                  |
 transfer_size   | integer                  |
 transfer_time   | integer                  |
 error_id        | integer                  |
 redirect_time   | integer                  |
 polling_id      | integer                  | not null
Indices: http_result_pk,
         http_timejobid

Thanks

Nikk


-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: 18 November 2002 16:25
To: Charles H. Woloszynski
Cc: Nikk Anderson; 'Stephan Szabo'; pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] selects from large tables 


"Charles H. Woloszynski" <chw(at)clearmetrix(dot)com> writes:
> Are you doing vaccums on these tables?  I was under the understanding 
> that the estimated row count should be close to the real row count 
> returned, and when it is not (as it looks in your case), the primary 
> reason for the disconnect is that the stats for the tables are 
> out-of-date.  

The fact that he's using 7.1 doesn't help any; the statistics mechanisms
in 7.1 are pretty weak compared to 7.2.

> Also, do you do any clustering of the data (since the queries are mostly 
> time limited)?  I am wondering if the system is doing lots of seeks to 
> get the data (implying that the data is all over the disk and not 
> clustered).

It would also be interesting to try a two-column index ordered the other
way (timestamp as the major sort key instead of ID).  Can't tell if that
will be a win without more info about the data properties, but it's
worth looking at.

			regards, tom lane

Responses

pgsql-performance by date

Next:From: Robert TreatDate: 2002-11-18 18:48:32
Subject: Re: selects from large tables
Previous:From: Nikk AndersonDate: 2002-11-18 16:32:49
Subject: Re: selects from large tables

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group