Re: PostgreSQL Write Performance

From: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
To: Yan Cheng Cheok <yccheok(at)yahoo(dot)com>
Cc: Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL Write Performance
Date: 2010-01-05 11:04:05
Message-ID: 8A4E4050-34DE-4459-979C-3752E9C02F21@solfertje.student.utwente.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 5 Jan 2010, at 8:30, Yan Cheng Cheok wrote:

>>> What is the actual problem you are trying to solve?
>
> I am currently developing a database system for a high speed measurement machine.
>
> The time taken to perform measurement per unit is in term of ~30 milliseconds. We need to record down the measurement result for every single unit. Hence, the time taken by record down the measurement result shall be far more less than milliseconds, so that it will have nearly 0 impact on the machine speed (If not, machine need to wait for database to finish writing, before performing measurement on next unit)
>
> Previously, we are using flat file.. However, using flat file is quite a mess, when come to generating reports to customers.

That flat file can help you with clustering INSERTs together and also means you'll have all your measurements ready for a single INSERT. The latter is useful if you're planning on using arrays to store your measurements, as updating array values requires the entire array to be rewritten to the database. I don't know how your measurements would arrive without the flat file, but I wouldn't be surprised if the measurements for a single unit would come out at different points in time, which would be a bit painful with arrays (not quite as much with a measurements table though).

A safe approach (with minimal risk of data loss) would be to split your flat file every n units (earlier in this thread a number of n=1000 was mentioned) and store that data using COPY in the format COPY expects. You will probably also want to keep a queue-table (which is just a normal table, but it's used like a queue) with the names of the flat files that need processing.

I haven't done this kind of thing before, but I envision it something like this:

CREATE TABLE unit (
id bigserial NOT NULL,
date date NOT NULL DEFAULT CURRENT_DATE,
measured text[],
measurements numeric(4,3)[]
);

CREATE TABLE queue (
file text NOT NULL,
definitive boolean DEFAULT False
);

---file-2010-01-05-00000001---
/* Update in it's own transaction so that we know we tried to process this file
* even if the transaction rolls back.
*/
UPDATE queue SET definitive = True
WHERE file = 'file-' || to_char(CURRENT_DATE, 'YYYY-MM-DD') || '-00000001';

/* Start work */
BEGIN;
COPY unit FROM STDIN;
1 {Width,Height} {0.001,0.021}
2 {Width,Height} {0.002,0.019}
...
999 {Width,Height} {0.000,0.018}
\.

/* This file was processed and can be removed from the queue */
DELETE FROM queue WHERE file='file-2010-01-05-00000001';
COMMIT;

/* This will probably be the name of the next flat file, but we don't know that
* for sure yet. It needs to be outside the transaction as otherwise CURRENT_DATE
* will have the date of the start of the transaction and we need to know what the
* next batch will be regardless of whether this one succeeded.
*/
INSERT INTO queue (file, definitive)
VALUES ('file-' || to_char(CURRENT_DATE, 'YYYY-MM-DD') || '-00001000', False);
---end of file---

You'd need a little program (a script will probably work) to read that queue table and send the commands in those files to the database. Don't forget that at the start the queue table will be empty ;) I recall some of this lists' members wrote up a webpage about how to implement queue-tables reliably.

Alban Hertroys

--
Screwing up is the best way to attach something to the ceiling.

!DSPAM:737,4b431caa10731320433375!

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2010-01-05 11:19:11 Re: Insert Data Into Tables Linked by Foreign Key
Previous Message Milan Zamazal 2010-01-05 09:19:26 Large tables, ORDER BY and sequence/index scans