Re: INSERT performance deteriorates quickly during a large import

From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: "Krasimir Hristozov \(InterMedia Ltd\)" <krasi(at)imedia-dev(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: INSERT performance deteriorates quickly during a large import
Date: 2007-11-08 18:37:27
Message-ID: 20071108133727.63c54dce.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

In response to "Krasimir Hristozov \(InterMedia Ltd\)" <krasi(at)imedia-dev(dot)com>:

> We need to import data from a relatively large MySQL database into an
> existing PostgreSQL database, using a PHP5 script that SELECTs from MySQL
> and INSERTs in PostgreSQL. A part of the import involves moving about
> 1,300,000 records from one MySQL table to one of our PostgreSQL tables. The
> problem is that the insert performance inevitably deteriorates as the number
> of inserts increases.
>
> We tried different approaches:
>
> * selecting only parts of the source table data based on a certain
> condition
> * selecting all of the source data
>
> coupled with either of these:
>
> * inserting without explicit transactions
> * inserting all the data in a single transaction
> * inserting the data in partial transactions of about 100, 1000, 10000,
> 100000 inserts each
>
> While there were performance benefits in some of the cases (selecting all
> the data and inserting in transaction packets of about 1000 each being the
> fastest), the problem was that it still deteriorated as the import
> progressed.
>
> We tried removing all foreign keys and indices from the postgres table,
> still gained performance, but it deteriorated as well.
>
> The latest (and best performing) test we did was under the following
> conditions:
>
> * 11851 pre-existing records in the destination table
> * the table was vacuumed just before the import
> * all foreign keys and indices were removed from the destination table
> * selected all of the data from the source table at once
> * inserted in transactions of 1000 inserts each
>
> We displayed time statistics on each 100 inserts. The process started at
> about 1 second per 100 inserts. This estimated to about 4 hours for the
> entire process. 14 hours later it had imported about a quarter of the data
> (a bit more than 330000 records), and 100 inserts now took nearly 40
> seconds.
>
> We tested reading from MySQL alone, without inserting the data in Postgres.
> All records were read in about a minute and a half, so MySQL performance is
> not a part of the issue. The PHP script selects the MySQL data, fetches rows
> sequentially, occasionally performs a couple of selects against PostgreSQL
> data (which is cached in a PHP array to reduce the DB operations; no more
> than 80000 array elements, integer keys, integer data), and inserts into
> PostgreSQL. The algorithm seems to be linear in nature and perfomance
> deterioration most probably doesn't have to do with the PHP code.
>
> Has anyone had an issue like this, and perhaps a suggestion for a possible
> cause and solution? Is it common for PostgreSQL to grow so slow as the
> amount of data in the tables increases? If so, is it just the insert
> operation or all kinds of queries? Isn't 300000 records too low a threshold
> for such performance deterioration?
>
> Here are some technical details, that might be helpful:
>
> * PHP, MySQL and PostgreSQL all work on the same server, sharing the same
> memory and hard drive.

This makes it very difficult to blame PostgreSQL. If the insert process
is CPU bound, and PHP is using a ton of CPU, then PG will be starved.
You kinda contradict yourself, saying PG is not starved, then saying
that the CPU is maxed out. In any event, having all three on one machine
will make it more fun to isolate where the actual bottleneck is.

> * the server runs FreeBSD 5.3-RELEASE on an AMD Athlon(tm) 64 Processor
> 3000+ (2GHz K8 class CPU) with 1GB RAM

This is another problem. 5.3 has the worst performance of any version
of FreeBSD I've ever used. Even downgrading to 4.11 (not recommended)
would produce a performance improvement, but you you should get this
system to 5.5 (at least) or 6.2 (preferable).

> * the software versions installed are Apache 1.3.37, PHP 5.2.1, MySQL
> 4.1.22, PostgreSQL 8.1.8
> * postgresql.conf variables other than defaults are: max_connections = 40,
> shared_buffers = 1000 (this is the default)

1000 shared_buffers is pretty low for any real work. While your tables
aren't huge, they're big enough to warrant more shared_buffers. However,
you've only got 1G of RAM on this system to share between two DB servers,
which is going to constrain you a good bit.

There are other settings important to insert performance that you haven't
mentioned. checkpoint_segments and the like, for example. You also don't
describe your disk subsystem, it's entirely possible you've filled up the
cache on the disk controllers (which is why it looked initially fast) and
now are hitting up against the max speed the disks can do. With only 1G
of RAM, it's possible that MySQL is reading, PostgreSQL is writing, and
PHP is swapping. It doesn't take a lot of disk contention to flush
performance down the toilet. See what iostat says. Even better, use
top in "m" mode (hit m after top start) to see how much IO each process
is using (I believe that was added in 5.X, but it may only be available
in 6.X versions of FreeBSD's top)

> * we have also tried these on another server with Red Hat Enterprise Linux
> ES release 4 (Linux 2.6.9-42.0.3.ELsmp) on 2xDual Core AMD Opteron(tm)
> Processor 270 (4x2GHz logical CPUs) with 2GB RAM

While 2G is better, that's still not a lot of RAM if you're trying to run
2 DB servers and a web server on a single system. If you haven't tuned
PG to take advantage of it, then it won't help much anyway. Additionally,
you've neglected to mention the disk subsystem on this machine as well.
Is it running cheapo SATA drives because the price/gig is right?

> * both servers run in x86_64 mode, PostgreSQL footprint in memory stays
> relatively small,

Of course it does, because you've told it not to use more than 8M of RAM.

> CPU usage maxes out on import, there is no resource
> starvation in any way

You do realize that you're contradicting yourself here, right?

The advice provided by others is good as well, so I won't repeat it.

--
Bill Moran
http://www.potentialtech.com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Magnus Hagander 2007-11-08 19:28:40 Re: System V IPC on Windows
Previous Message Chris Browne 2007-11-08 18:34:56 Re: INSERT performance deteriorates quickly during a large import