Re: Using Postgres to store high volume streams of sensor readings

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ciprian Dorin Craciun <ciprian(dot)craciun(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Using Postgres to store high volume streams of sensor readings
Date: 2008-11-21 17:45:52
Message-ID: Pine.GSO.4.64.0811211224290.23150@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 21 Nov 2008, Tom Lane wrote:

> Not sure if it applies to your real use-case, but if you can try doing
> the COPY from a local file instead of across the network link, it
> might go faster.

The fact that the inserts are reported as fast initially but slow as the
table and index size grow means it's probably a disk bottleneck rather
than anything related to the client itself. If the network overhead was
the issue, I wouldn't expect it to start fast like that. Ditto for
concerns about the random function being slow. Either of those might
speed up the initial, fast period a bit, but they're not going to impact
the later slowdown.

Ciprian, the main interesting piece of data to collect here is a snapshot
of a few samples lines from the output from "vmstat 1" during the initial,
fast loading section versus the slower period. I think what's happening
to you is that maintaining the index blocks on the disk is turning into
increasingly random I/O as the size of the table grows, and your disks
just can't keep up with that. What I'd expect is that initially the
waiting for I/O "wa" figure will be low, but it will creep up constantly
and at some point spike up hard after the working set of data operated on
exceeds memory.

The fact that PostgreSQL performs badly here compared to the more
lightweight databases you've used isn't that much of a surprise. There's
a fair amount of overhead for the write-ahead log and the MVCC
implementation in the database, and your application is suffering from all
that robustness overhead but not really gaining much of a benefit from it.
The main things that help in this sort of situation are increases in
shared_buffers and checkpoint_segments, so that more database information
is stored in RAM for longer rather than being pushed to disk too quickly,
but what Rafael suggested already got you most of the possible improvement
here. You might get an extra bit of boost by adjusting the index
FILLFACTOR upwards (default is 90, if you're never updating you could try
100). I doubt that will be anything but a minor incremental improvement
though.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ciprian Dorin Craciun 2008-11-21 17:46:54 Re: Using Postgres to store high volume streams of sensor readings
Previous Message Ciprian Dorin Craciun 2008-11-21 17:42:57 Re: Using Postgres to store high volume streams of sensor readings