Skip site navigation (1) Skip section navigation (2)

Re: Atomic access to large arrays

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: "Victor de Buen (Bayes)" <vdebuen(at)bayesinf(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Atomic access to large arrays
Date: 2009-07-22 05:11:15
Message-ID: Pine.LNX.4.64.0907220910220.8065@sn.sai.msu.ru (view raw or flat)
Thread:
Lists: pgsql-performance
Victor,

Just wondering why do you use array ?

Oleg
On Wed, 22 Jul 2009, Victor de Buen (Bayes) wrote:

> Hi
>
> I'm storing historical meteorological gridded data from GFS (
> http://www.nco.ncep.noaa.gov/pmb/products/gfs/) into an array field in a
> table like this:
>
> CREATE TABLE grid_f_data_i2 (
>  //Specifies the variable and other features of data
>  id_inventory integer REFERENCES grid_d_inventory(id_inventory),
>  //A new grid is available each 3 hours since 5 years ago
>  dh_date timestamp,
>  //Values are scaled to be stored as signed integers of 2 bytes
>  vl_grid smallint[361][720],
> CONSTRAINT meteo_f_gfs_tmp PRIMARY KEY
>  (co_inventory, dh_date)
> );
>
> Dimensions of each value of field vl_grid are (lat:361 x lon:720 = 259920
> cells} for a grid of 0.5 degrees (about each 55 Km) around the world. So,
> vl_grid[y][x] stores the value at dh_date of a meteorological variable
> specified by id_inventory in the geodesic point
>
> latitude  = -90 + y*0.5
> longitude = x*0.5
>
> The reverse formula for the closest point in the grid of an arbitary
> geodesic point will be
>
> y = Round((latitude+90) * 2
> x = Round(longitude*2)
>
> Field vl_grid is stored in the TOAST table and has a good compression level.
> PostgreSql is the only one database that is able to store this huge amount
> of data in only 34 GB of disk. It's really great system. Queries returning
> big rectangular areas are very fast, but the target of almost all queries is
> to get historical series for a geodesic point
>
> SELECT  dh_date, vl_grid[123][152]
> FROM  grid_f_data_i2
> WHERE  id_inventory = 6
> ORDER BY dh_date
>
> In this case, atomic access to just a cell of each one of a only few
> thousands of rows becomes too slow.
>
> Please, could somebody answer some of these questions?
>
>   - It's posible to tune some TOAST parameters to get faster atomic access
>   to large arrays?
>
>
>   - Using "EXTERNAL" strategy for storing TOAST-able columns could solve
>   the problem?
>
>
>   - Atomic access will be faster if table stores vectors for data in the
>   same parallel instead of matrices of global data?
>   CREATE TABLE grid_f_data_i2 (
>     //Specifies the variable and other features of data
>     id_inventory integer REFERENCES grid_d_inventory(id_inventory),
>     //A new grid is available each 3 hours since 5 years ago
>     dh_date timestamp,
>     // nu_parallel = y = Round((latitude+90) * 2
>     smallint nu_parallel,
>     //Values are scaled to be stored as signed integers of 2 bytes
>     vl_parallel smallint[],
>   CONSTRAINT meteo_f_gfs_tmp PRIMARY KEY
>     (co_inventory, nu_parallel, dh_date)
>   );
>
>    - There is another faster solution?
>
> Thanks in advance and best regards
>
>

 	Regards,
 		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

pgsql-performance by date

Next:From: Scott MarloweDate: 2009-07-22 05:42:20
Subject: Re: Master/Slave, DB separation or just spend $$$?
Previous:From: Kelvin QueeDate: 2009-07-22 03:47:43
Subject: Master/Slave, DB separation or just spend $$$?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group