Re: insert vs select into performance

From: Thomas Finneid <tfinneid(at)student(dot)matnat(dot)uio(dot)no>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: insert vs select into performance
Date: 2007-07-18 19:08:08
Message-ID: 469E6518.5000203@ifi.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


PFC wrote:
>> Unfortunately its not fast enough, it needs to be done in no more than
>> 1-2 seconds, ( and in production it will be maybe 20-50 columns of
>> data, perhaps divided over 5-10 tables.)
>> Additionally it needs to scale to perhaps three times as many columns
>> and perhaps 2 - 3 times as many rows in some situation within 1 seconds.
>> Further on it needs to allow for about 20 - 50 clients reading much of
>> that data before the next batch of data arrives.
>
> Wow. What is the application ?

Geological surveys, where they perform realtime geo/hydro-phone shots of
areas of the size of 10x10km every 3-15 seconds.

> test=> CREATE OR REPLACE FUNCTION test_insert( )
> RETURNS VOID
> LANGUAGE plpgsql
> AS
> $$
> DECLARE
> _i INTEGER;
> BEGIN
> FOR _i IN 0..100000 LOOP
> INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
> END LOOP;
> END;
> $$;
> CREATE FUNCTION
> Temps : 51,948 ms
>
> test=> SELECT test_insert();
> test_insert
> -------------
>
> (1 ligne)
>
> Temps : 1885,382 ms

I tested this one and it took 4 seconds, compared to the jdbc insert
which took 14 seconds, so its a lot faster. but not as fast as the
SELECT INTO.

I also tested an INSERT INTO FROM SELECT, which took 1.8 seconds, now we
are starting to talk about real performance.

> However COPY is much faster because the parsing overhead and
> de-escaping of data is faster. COPY is optimized for throughput.
>
> So, advice :
>
> For optimum throughput, have your application build chunks of data
> into text files and use COPY. Or if your client lib supports the copy
> interface, use it.

I did test COPY, i.e. the jdbc COPY patch for pg 8.1, it performs at
approx 1.8 seconds :) The test was done with text input, I am going to
test it with binary input, which I expect will increase the performance
with 20-50%.

All these test have ben performed on a laptop with a Kubuntu 6.10
version of pg 8.1 without any special pg performance tuning. So I expect
that compiling lates pg and doing some tuning on it and testing it on
the a representative server will give it an additional boost in performance.

The key here is that with abundance in performance, I can experiment
with the solution in a completely different way than if I had any
"artificial" restrictions.

> You will need a fast disk system with xlog and data on separate
> disks, several CPU cores (1 insert thread will max out 1 core, use the
> others for selects), lots of RAM so index updates don't need to seek,
> and tuning of bgwriter and checkpoints to avoid load spikes.

will have a look at it.

regards

thomas

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Thomas Finneid 2007-07-18 19:11:20 Re: insert vs select into performance
Previous Message Jim C. Nasby 2007-07-18 18:52:51 Re: When/if to Reindex