Skip site navigation (1) Skip section navigation (2)

Re: Strategy for doing number-crunching

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matthew Foster <matthew(dot)foster(at)noaa(dot)gov>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Strategy for doing number-crunching
Date: 2012-01-04 21:11:21
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-novice
Matthew Foster <matthew(dot)foster(at)noaa(dot)gov> writes:
> On Wed, Jan 4, 2012 at 10:48 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Matthew Foster <matthew(dot)foster(at)noaa(dot)gov> writes:
>>> We have a database with approximately 130M rows, and we need to produce
>>> statistics (e.g. mean, standard deviation, etc.) on the data.  Right now,
>>> we're generating these stats via a single SELECT, and it is extremely
>>> it can take hours to return results.

>> What datatype are the columns being averaged?  If "numeric", consider
>> casting to float8 before applying the aggregates.  You'll lose some
>> precision but it'll likely be orders of magnitude faster.

> The data are type double.

Hmm.  In that case I think you have some other problem that's hidden in
details you didn't show us.  It should not take "hours" to process only
130M rows.  This would best be taken up on pgsql-performance; please see

			regards, tom lane

In response to


pgsql-novice by date

Next:From: Carlos MennensDate: 2012-01-04 21:30:07
Subject: Verify My Database Isn't Slammed
Previous:From: Matthew FosterDate: 2012-01-04 18:28:23
Subject: Re: Strategy for doing number-crunching

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group