Re: Performance issues with large amounts of time-series data

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hrishikesh (हृषीकेश मेहेंदळे) <hashinclude(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance issues with large amounts of time-series data
Date: 2009-08-26 18:01:43
Message-ID: 18070.1251309703@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

=?UTF-8?B?SHJpc2hpa2VzaCAo4KS54KWD4KS34KWA4KSV4KWH4KS2IOCkruClh+CkueClh+CkguCkpuCksw==?= =?UTF-8?B?4KWHKQ==?= <hashinclude(at)gmail(dot)com> writes:
> In my timing tests, the performance of PG is quite a lot worse than the
> equivalent BerkeleyDB implementation.

Are you actually comparing apples to apples? I don't recall that BDB
has any built-in aggregation functionality. It looks to me like you've
moved some work out of the client into the database.

> 1. Is there anything I can do to speed up performance for the queries?

Do the data columns have to be bigint, or would int be enough to hold
the expected range? SUM(bigint) is a *lot* slower than SUM(int),
because the former has to use "numeric" arithmetic whereas the latter
can sum in bigint. If you want to keep the data on-disk as bigint,
but you know the particular values being summed here are not that
big, you could cast in the query (SUM(data_1::int) etc).

I'm also wondering if you've done something to force indexscans to be
used. If I'm interpreting things correctly, some of these scans are
traversing all/most of a partition and would be better off as seqscans.

> shared_buffers = 128MB

This is really quite lame for the size of machine and database you've
got. Consider knocking it up to 1GB or so.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Hrishikesh (हृषीकेश मेहेंदळे) 2009-08-26 18:39:40 Re: Performance issues with large amounts of time-series data
Previous Message Hrishikesh (हृषीकेश मेहेंदळे) 2009-08-26 17:31:13 Performance issues with large amounts of time-series data