Re: Postgres insert performance and storage requirement compared to Oracle

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-27 17:48:35
Message-ID: E2FA14BE-A47B-4673-9DE1-851EB74009F2@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance


On Oct 26, 2010, at 2:54 PM, Mladen Gogala wrote:

> On 10/26/2010 5:27 PM, Jon Nelson wrote:
>> start loop:
>> populate rows in temporary table
>> insert from temporary table into permanent table
>> truncate temporary table
>> loop
>>
>> I do something similar, where I COPY data to a temporary table, do
>> lots of manipulations, and then perform a series of INSERTS from the
>> temporary table into a permanent table.
>>
>
> 1) It's definitely not faster because you have to insert into the
> temporary table, in addition to inserting into the permanent table.

It is almost always significantly faster than a direct bulk load into a table.
* The temp table has no indexes, the final table usually does, bulk operations on indexes are faster than per row operations.
* The final table might require both updates and inserts, doing these in bulk from a temp stage table is far faster than per row.
* You don't even have to commit after the merge from the temp table, and can loop until its all done, then commit -- though this can have table/index bloat implications if doing updates.

> 2) This is what I had in mind:
>
> mgogala=# create table a(c1 int);
> CREATE TABLE
> mgogala=# create temporary table t1(c1 int) on commit delete rows;
> CREATE TABLE
> mgogala=# begin;
> BEGIN
> mgogala=# insert into t1 select generate_series(1,1000);
> INSERT 0 1000
> mgogala=# insert into a select * from t1;
> INSERT 0 1000
> mgogala=# commit;
> COMMIT
> mgogala=# select count(*) from a;
> count
> -------
> 1000
> (1 row)
>
> mgogala=# select count(*) from t1;
> count
> -------
> 0
> (1 row)
>
> The table is created with "on commit obliterate rows" option which means
> that there is no need to do "truncate". The "truncate" command is a
> heavy artillery. Truncating a temporary table is like shooting ducks in
> a duck pond, with a howitzer.

??? Test it. DELETE is slow, truncate is nearly instantaneous for normal tables. For temp tables its the same thing. Maybe in Oracle TRUNCATE is a howitzer, in Postgres its lightweight. Your loop above requires a commit after every 1000 rows. What if you require that all rows are seen at once or not at all? What if you fail part way through? One big transaction is often a better idea and/or required. Especially in postgres, with no undo-log, bulk inserts in one large transaction work out very well -- usually better than multiple smaller transactions.
>
> --
>
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> http://www.vmsinfo.com
> The Leader in Integrated Media Intelligence Solutions
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2010-10-27 18:04:54 Re: Simplifying replication
Previous Message Josh Berkus 2010-10-27 17:37:24 Re: Simplifying replication

Browse pgsql-performance by date

  From Date Subject
Next Message Jon Nelson 2010-10-27 17:59:40 Re: temporary tables, indexes, and query plans
Previous Message Mladen Gogala 2010-10-27 17:44:24 Re: temporary tables, indexes, and query plans