Re: Atomic access to large arrays

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: "Victor de Buen (Bayes)" <vdebuen(at)bayesinf(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Atomic access to large arrays
Date: 2009-07-22 05:11:15
Message-ID: Pine.LNX.4.64.0907220910220.8065@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Victor,

Just wondering why do you use array ?

Oleg
On Wed, 22 Jul 2009, Victor de Buen (Bayes) wrote:

> Hi
>
> I'm storing historical meteorological gridded data from GFS (
> http://www.nco.ncep.noaa.gov/pmb/products/gfs/) into an array field in a
> table like this:
>
> CREATE TABLE grid_f_data_i2 (
> //Specifies the variable and other features of data
> id_inventory integer REFERENCES grid_d_inventory(id_inventory),
> //A new grid is available each 3 hours since 5 years ago
> dh_date timestamp,
> //Values are scaled to be stored as signed integers of 2 bytes
> vl_grid smallint[361][720],
> CONSTRAINT meteo_f_gfs_tmp PRIMARY KEY
> (co_inventory, dh_date)
> );
>
> Dimensions of each value of field vl_grid are (lat:361 x lon:720 = 259920
> cells} for a grid of 0.5 degrees (about each 55 Km) around the world. So,
> vl_grid[y][x] stores the value at dh_date of a meteorological variable
> specified by id_inventory in the geodesic point
>
> latitude = -90 + y*0.5
> longitude = x*0.5
>
> The reverse formula for the closest point in the grid of an arbitary
> geodesic point will be
>
> y = Round((latitude+90) * 2
> x = Round(longitude*2)
>
> Field vl_grid is stored in the TOAST table and has a good compression level.
> PostgreSql is the only one database that is able to store this huge amount
> of data in only 34 GB of disk. It's really great system. Queries returning
> big rectangular areas are very fast, but the target of almost all queries is
> to get historical series for a geodesic point
>
> SELECT dh_date, vl_grid[123][152]
> FROM grid_f_data_i2
> WHERE id_inventory = 6
> ORDER BY dh_date
>
> In this case, atomic access to just a cell of each one of a only few
> thousands of rows becomes too slow.
>
> Please, could somebody answer some of these questions?
>
> - It's posible to tune some TOAST parameters to get faster atomic access
> to large arrays?
>
>
> - Using "EXTERNAL" strategy for storing TOAST-able columns could solve
> the problem?
>
>
> - Atomic access will be faster if table stores vectors for data in the
> same parallel instead of matrices of global data?
> CREATE TABLE grid_f_data_i2 (
> //Specifies the variable and other features of data
> id_inventory integer REFERENCES grid_d_inventory(id_inventory),
> //A new grid is available each 3 hours since 5 years ago
> dh_date timestamp,
> // nu_parallel = y = Round((latitude+90) * 2
> smallint nu_parallel,
> //Values are scaled to be stored as signed integers of 2 bytes
> vl_parallel smallint[],
> CONSTRAINT meteo_f_gfs_tmp PRIMARY KEY
> (co_inventory, nu_parallel, dh_date)
> );
>
> - There is another faster solution?
>
> Thanks in advance and best regards
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2009-07-22 05:42:20 Re: Master/Slave, DB separation or just spend $$$?
Previous Message Kelvin Quee 2009-07-22 03:47:43 Master/Slave, DB separation or just spend $$$?