very large tables

From: Ramiro Diaz Trepat <ramiro(at)diaztrepat(dot)name>
To: pgsql-general(at)postgresql(dot)org
Subject: very large tables
Date: 2009-05-25 08:58:54
Message-ID: 313dbe090905250158i7e0284b6r71f0f9275d025246@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello list,I will try to make this as brief as possible.
I have a brother who is a scientist studding atmospheric problems. He was
trying to handle all of his data with flat files and MatLab, when I stepped
in and said, wait, you need a RDBMS to handle all this data.
So, he basically has 2 very simple tables, one describing an atmosphere
pixel (latitude, longitude and height) and a couple of other simple values.
The other table has values of different variables meassured at each pixel.
The table with the atmosphere pixels, currently has about 140MM records, and
the one the values about 1000MM records. They should both grow to about
twice this size.
I have not started yet to deal with the largest table, I am only doing test
with the smaller one (140MM rows), and they much slower than what we were
expecting.
A simple query with no joins like this:

select trajectory from atmospherepoint where moment='1979-05-02 11:45:00'
and height >= 12000 and height <= 14000 and longitude >= 180 and longitude
<= 190 and latitude >= 0 and latitude <= 10;

is taking almost 4 minutes in a decent multi core server. Also, the moment
equality test should yield no more than 2MM rows, so I thought that should
make things a lot faster.

The explain returns the suspicious "Seq Scan" up front:

Seq Scan on atmospherepoint (cost=0.00..5461788.08 rows=134 width=8)
Filter: ((moment = '1979-05-02 11:45:00'::timestamp without time zone)
AND (height >= 12000::double precision) AND (height <= 14000::double
precision) AND (longitude >= 180::double precision) AND (longitude <=
190::double precision) AND (latitude >= 0::double precision) AND (latitude
<= 10::double precision))

but I have created indices for every column in the table and I have also
runned ANALIZE and VACUUM:

Table "public.atmospherepoint2"
Column | Type | Modifiers
------------+-----------------------------+-----------
id | integer | not null
trajectory | integer | not null
moment | timestamp without time zone | not null
longitude | real | not null
latitude | real | not null
height | real | not null
Indexes:
"atmospherepoint2_pkey" PRIMARY KEY, btree (id)
"atm_height_idx" btree (height)
"atm_latitude_idx" btree (latitude)
"atm_longitude_idx" btree (longitude)
"atm_moment_idx" btree (moment)
"atm_trajectory_idx" btree (trajectory)
Foreign-key constraints:
"atmospherepoint2_trajectory_fkey" FOREIGN KEY (trajectory) REFERENCES
trajectory2(id)

I will welcome a lot any advice or pointer on how to tune these tables to
work faster.
The tables don't change at all once the data has been loaded, they are only
used for queries.
Thank you very much.

r.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Otandeka Simon Peter 2009-05-25 09:15:57 Re: very large tables
Previous Message Sebastien FLAESCH 2009-05-25 07:32:00 Re: INTERVAL data type and libpq - what format?