Re: Re: Slowdown problem when writing 1.7million records

From: Marc SCHAEFER <schaefer(at)alphanet(dot)ch>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Re: Slowdown problem when writing 1.7million records
Date: 2001-02-27 14:13:17
Message-ID: Pine.LNX.3.96.1010227150911.2241C-100000@defian.alphanet.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> Server (v6.2), I am accessing the files using JDBC from a Windows 2000 PC.

I don't use Microsoft software, nor Java, but a few general suggestions
for you:

- you may want to create the INDEX at the end (using CREATE INDEX)
in one operation

- you may want to run your requests in transactions: e.g. a transaction
which is committed every N insertion.

- in some rare case you may want to disable the fsync() of the
PostgreSQL

- you may want to use a more efficient interface, such as the
COPY command.

Example (using transactions)

#! /usr/bin/perl -wI /home/schaefer/perl-libs
# $Id$

use DBI;
use test_db;
use strict;

my $debug = 0;

my $result = 0; # success
my $reason = "of an unknown error";

$| = 1;

my $dbh = &open_database();
if (defined($dbh)) {
my $amount_entries = 4000000;
my $commit_every = 10000;

# Sometimes, large data entries are done better with a COPY.

$dbh->{AutoCommit} = 0; # Use transactions.

foreach (('sol_f', 'sol_i', 'sol_d')) {
# Using transactions should make insertion faster, since fsync()
# are probably not required. However, when changes are very big,
# it might actually make it slower or using much space, this
# is why we have this $commit_every above and below.
# was expecting a BEGIN WORK; but that
# seem to be implicit.

eval {
my $i;
my $failed_reason = "unknown db error";

print "Populating " . $_ . " ...";
for ($i = 0; ($i < $amount_entries) && ($result == 0); $i++) {
my @titles = ('id', 'ref', 'sentpos', 'wordpos');
if (!&do_query($dbh,
"INSERT INTO " . $_ . "("
. join(", ", @titles)
. ") VALUES ("
. join(", ", ('?') x @titles)
. ")",
\(at)titles,
undef,
[ int(rand(32768)),
'truc',
int(rand(32768)),
'temp'
],
undef,
\$failed_reason)) {
$result = 1;
$reason = "can't insert " . $i . ": " . $failed_reason;
}
else {
if (($i % $commit_every) == 0) {
if ($dbh->commit) {
print "C ";
}
else {
$result = 1;
$reason = "can't commit: " . $dbh->errstr;
}
}
}
}

if ($result == 0) {
print " POPULATED.\n";
}
else {
print " FAILED.\n";
}
}; # DB is not set to die mode, so we will catch only our bugs.
if ($@) {
print;
$result = 1;
$reason = "transaction failed: " . $@; # Not always right.
$dbh->rollback; # res. ign. (in failure mode anyway)
}
elsif ($result) {
$dbh->rollback; # res. ign. (in failure mode anyway)
}
else {
if ($dbh->commit) {
$result = 1;
$reason = "can't commit: " . $dbh->errstr;
}
print "COMMITTED.\n";
}
}

$dbh->{AutoCommit} = 1; # No transactions

if (!$dbh->disconnect) {
$result = 1;
$reason = "disconnect error: " . $dbh->errstr;
}
undef $dbh;
}
else {
# Obviously, can't use $dbh->errstr here.
$reason = "can't database connect: " . $DBI::errstr;
$result = 1;
}

if ($result) {
print $0 . ": failed " . $result . " because " . $reason . "\n";
}
else {
print "SUCCESSFUL.\n";
}
exit $result;

sub create_table {
my($dbh, $name, $val) = @_;
my $result = 0;

my $sth = $dbh->prepare("CREATE TABLE $name ($val)");
if (defined($sth)) {
my $rv = $sth->execute;
if (defined($rv)) {
if ($debug) {
print "$name: succeeded.\n";
}

$result = 1;
}

$sth->finish;
undef $sth;
}

return $result;
}

And using COPY:

#! /usr/bin/perl -wI /home/schaefer/perl-libs
# USAGE
# ./copy.pl | psql test_db
# $Id$

my $amount_entries = 4000000;
my $tell_every = 100000;

print "COPY sol_f FROM stdin;\n";
my $i;
for ($i = 1; $i <= $amount_entries; $i++) {
print int(rand(32768)) . "\t" . 'truc' . "\t" . int(rand(32768)) . "\t"
. 'temp' . "\n";
if (($i % $tell_every) == 0) {
print STDERR $i . "\n";
}
}
print ".\n";

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Nico 2001-02-27 14:13:20 inheritance and partial index: how to override constraints or default_values
Previous Message Gunnar R|nning 2001-02-27 14:05:53 Re: Can PostgreSQL be a mail backend?