Re: Atomic access to large arrays

From: Victor de Buen <vdebuen(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Atomic access to large arrays
Date: 2009-07-22 18:28:23
Message-ID: f432ce640907221128u7310da7eqb1076db56e0acf14@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thank a lot, Merlin.

I will try to fill a sample of grids in a new table with different sizes of
subgrids in order to get the better relation between space and speed.

Regards

2009/7/22 Merlin Moncure <mmoncure(at)gmail(dot)com>

> On Tue, Jul 21, 2009 at 7:43 PM, Victor de Buen
> (Bayes)<vdebuen(at)bayesinf(dot)com> wrote:
> > Hi
> >
> > I'm storing historical meteorological gridded data from GFS
> > (http://www.nco.ncep.noaa.gov/pmb/products/gfs/) into an array field in
> a
> > table like this:
> >
> > CREATE TABLE grid_f_data_i2 (
> > //Specifies the variable and other features of data
> > id_inventory integer REFERENCES grid_d_inventory(id_inventory),
> > //A new grid is available each 3 hours since 5 years ago
> > dh_date timestamp,
> > //Values are scaled to be stored as signed integers of 2 bytes
> > vl_grid smallint[361][720],
> > CONSTRAINT meteo_f_gfs_tmp PRIMARY KEY
> > (co_inventory, dh_date)
> > );
> >
> > Dimensions of each value of field vl_grid are (lat:361 x lon:720 = 259920
> > cells} for a grid of 0.5 degrees (about each 55 Km) around the world. So,
> > vl_grid[y][x] stores the value at dh_date of a meteorological variable
> > specified by id_inventory in the geodesic point
> >
> > latitude = -90 + y*0.5
> > longitude = x*0.5
> >
> > The reverse formula for the closest point in the grid of an arbitary
> > geodesic point will be
> >
> > y = Round((latitude+90) * 2
> > x = Round(longitude*2)
> >
> > Field vl_grid is stored in the TOAST table and has a good compression
> level.
> > PostgreSql is the only one database that is able to store this huge
> amount
> > of data in only 34 GB of disk. It's really great system. Queries
> returning
> > big rectangular areas are very fast, but the target of almost all queries
> is
> > to get historical series for a geodesic point
> >
> > SELECT dh_date, vl_grid[123][152]
> > FROM grid_f_data_i2
> > WHERE id_inventory = 6
> > ORDER BY dh_date
> >
> > In this case, atomic access to just a cell of each one of a only few
> > thousands of rows becomes too slow.
>
> That's a side effect of your use of arrays. Arrays are very compact,
> and ideal if you always want the whole block of data at once, but
> asking for particular point is the down side of your trade off. I
> would suggest maybe experimenting with smaller grid sizes...maybe
> divide your big grid into approximately 16 (4x4) separate subgrids.
> This should still 'toast', and give decent compression, but mitigate
> the impact of single point lookup somewhat.
>
> merlin
>

--
Víctor de Buen Remiro
Consultor estadístico
Bayes Forecast
www.bayesforecast.com
Tol Development Team member
www.tol-project.org

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message David Rees 2009-07-22 19:29:25 Re: Master/Slave, DB separation or just spend $$$?
Previous Message Robert Haas 2009-07-22 18:19:52 Re: regression ? 8.4 do not apply One-Time Filter to subquery