Re: Postgres insert performance and storage requirement compared to Oracle

From: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-27 18:06:53
Message-ID: 4CC86A3D.8000507@vmsinfo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 10/27/2010 1:48 PM, Scott Carey wrote:
>
> It is almost always significantly faster than a direct bulk load into a table.
> * The temp table has no indexes, the final table usually does, bulk operations on indexes are faster than per row operations.
> * The final table might require both updates and inserts, doing these in bulk from a temp stage table is far faster than per row.
> * You don't even have to commit after the merge from the temp table, and can loop until its all done, then commit -- though this can have table/index bloat implications if doing updates.

Scott, I find this very hard to believe. If you are inserting into a
temporary table and then into the target table, you will do 2 inserts
instead of just one. What you are telling me is that it is faster for me
to drive from NYC to Washington DC by driving first to Miami and then
from Miami to DC.

>> 2) This is what I had in mind:
>>
>> mgogala=# create table a(c1 int);
>> CREATE TABLE
>> mgogala=# create temporary table t1(c1 int) on commit delete rows;
>> CREATE TABLE
>> mgogala=# begin;
>> BEGIN
>> mgogala=# insert into t1 select generate_series(1,1000);
>> INSERT 0 1000
>> mgogala=# insert into a select * from t1;
>> INSERT 0 1000
>> mgogala=# commit;
>> COMMIT
>> mgogala=# select count(*) from a;
>> count
>> -------
>> 1000
>> (1 row)
>>
>> mgogala=# select count(*) from t1;
>> count
>> -------
>> 0
>> (1 row)
>>
>> The table is created with "on commit obliterate rows" option which means
>> that there is no need to do "truncate". The "truncate" command is a
>> heavy artillery. Truncating a temporary table is like shooting ducks in
>> a duck pond, with a howitzer.
> ??? Test it. DELETE is slow, truncate is nearly instantaneous for normal tables. For temp tables its the same thing. Maybe in Oracle TRUNCATE is a howitzer, in Postgres its lightweight.

Truncate has specific list of tasks to do:
1) lock the table in the exclusive mode to prevent concurrent
transactions on the table.
2) Release the file space and update the table headers.
3) Flush any buffers possibly residing in shared memory.
4) Repeat the procedures on the indexes.

Of course, in case of the normal table, all of these changes are logged,
possibly producing WAL archives. That is still much faster than delete
which depends on the number of rows that need to be deleted, but not
exactly lightweight, either. In Postgres, truncate recognizes that the
table is a temporary table so it makes a few shortcuts, which makes the
truncate faster.

1) No need to flush buffers.
2) Locking requirements are much less stringent.
3) No WAL archives are produced.

Temporary tables are completely different beasts in Oracle and Postgres.
Yes, you are right, truncate of a temporary table is a big no-no in the
Oracle world, especially in the RAC environment. However, I do find "ON
COMMIT DELETE ROWS" trick to be more elegant than the truncate. Here is
the classic Tom Kyte, on the topic of truncating the temporary tables:
*http://tinyurl.com/29kph3p

"*NO. truncate is DDL. DDL is expensive. Truncation is something that
should be done very infrequently.
Now, I don't mean "turn your truncates into DELETE's" -- that would
be even worse. I mean -- avoid having
to truncate or delete every row in the first place. Use a transaction
based temporary table and upon commit, it'll empty itself."

> Your loop above requires a commit after every 1000 rows. What if you require that all rows are seen at once or not at all? What if you fail part way through? One big transaction is often a better idea and/or required. Especially in postgres, with no undo-log, bulk inserts in one large transaction work out very well -- usually better than multiple smaller transactions.

I don't contest that. I also prefer to do things in one big transaction,
if possible.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2010-10-27 18:13:21 Re: Postgres insert performance and storage requirement compared to Oracle
Previous Message Merlin Moncure 2010-10-27 18:06:00 Re: Postgres insert performance and storage requirement compared to Oracle

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2010-10-27 18:13:21 Re: Postgres insert performance and storage requirement compared to Oracle
Previous Message Merlin Moncure 2010-10-27 18:06:00 Re: Postgres insert performance and storage requirement compared to Oracle