Re: New to the list; would this be an okay question?

From: Madison Kelly <linux(at)alteeve(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: New to the list; would this be an okay question?
Date: 2004-06-21 15:35:02
Message-ID: 40D70026.1020506@alteeve.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Richard Huxton wrote:
> Madison Kelly wrote:
>
>> Hi all,
>>
>> I am new to the list and I didn't want to seem rude at all so I
>> wanted to ask if this was okay first.
>
>
> No problem. Reading your message below, you might want to try the
> performance list, but general is a good place to start.
>
>> I have a program I have written in perl which uses a postgresSQL
>> database as the backend. The program works but the performance is
>> really bad. I have been reading as much as I can on optimizing
>> performance but still it isn't very reasonable. At one point I had my
>> program able to process 175,000 records in 16min 10sec on a Pentium3
>> 650MHz, 448MB RAM test machine. Since then I got a Pentium3 1GHz,
>> 512MB system and I have tried a lot of things to get the performance
>> up but now it is substantially slower and I can't seem to figure out
>> what I am doing wrong.
>
>
> A few places to start:
> 1. VACUUM FULL
> This will make sure any unused space is reclaimed
> 2. ANALYZE
> This will recalculate stats for the tables
> 3. Basic performce tuning:
> http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
> There's also a good guide to the postgresql.conf file on varlena.com
>
>> Would it be appropriate to ask for help on my program on this list?
>> Full disclosure: The program won't be initially GPL'ed because it is
>> for my company but it will be released for free to home users and the
>> source code will be made available (similar to other split-license
>> programs) though once my company makes it's money back I think they
>> will fully GPL it (I am on my boss's case about it :p ).
>
>
> No problem - what you licence your software under is your concern. Once
> you've taken the basic steps described above, try to pick out a specific
> query that you think is too slow and provide:
>
> 1. PostgreSQL version
> 2. Basic hardware info (as you have)
> 3. Sizes of tables.
> 4. Output of EXPLAIN ANALYZE <query here>
>
> The EXPLAIN ANALYZE runs the query and shows how much work PG thought it
> would be and how much it actually turned out to be.
>
> HTH

Thank you very much!! I am using Psql 7.4 on a stock install of Fedora
Core 2 on my IBM thinkpad a22m (P3 1GHz, 512MB RAM, not the fastest
HDD). The drive carrier I am using is connected via USB2 and uses a few
different hard drives with the fastest being a couple of Barracuda
7200.7 drives (2MB cache, 7,200rpm). I described the program in my reply
to Martijn so here is some of the code (code not related to psql
snipped, let me know if posting it would help - sorry for the wrapping...):

=-[ Calling the database ]-=
# Open the connection to the database
my $DB = DBI->connect("DBI:Pg:dbname=$db_name","$user")|| die("Connect
error (Is PostgresSQL running?): $DBI::errstr");

# Prepare the select statements before using them for speed:
$select_sth = $DB->prepare("SELECT null FROM file_dir WHERE
file_src_uuid=? AND file_parent_dir=? AND file_name=?") || die
"$DBI::errstr";
$select_up = $DB->prepare("UPDATE file_dir SET file_perm=?,
file_own_user=?, file_own_grp=?, file_size=?, file_mod_date=?,
file_mod_time=?, file_mod_time_zone=?, file_exist=? WHERE
file_src_uuid=? AND file_parent_dir=? AND file_name=?") || die
"$DBI::errstr";
$select_in = $DB->prepare("INSERT INTO file_dir ( file_src_uuid,
file_name, file_dir, file_parent_dir, file_perm, file_own_user,
file_own_grp, file_size, file_mod_date, file_mod_time,
file_mod_time_zone, file_backup, file_restore, file_display, file_exist
) VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? )") || die
"$DBI::errstr";

# Set the 'file_exist' flag to 'false' and reset exiting files to 'true'.
$DB->do("UPDATE file_dir SET file_exist='f' WHERE
file_src_uuid='$file_src_uuid'") || die "$DBI::errstr";

# Start scanning the drive
$num = $select_sth->execute($file_src_uuid,$relative_dir,$file_name) ||
die "$DBI::errstr";
if ( $num > 0 )
{
$select_up->execute($file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_exist,$file_src_uuid,$file_parent_dir,$file_name)
|| die "$DBI::errstr";
}
else
{

$select_in->execute($file_src_uuid,$file_name,$file_dir,$file_parent_dir,$file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_backup,$file_restore,$file_display,$file_exist)
|| die "$DBI::errstr";
}

# We need to grab the existing file settings for the special file '/.'
$DBreq=$DB->prepare("SELECT file_backup, file_restore, file_display FROM
file_dir WHERE file_parent_dir='/' AND file_name='.' AND
file_src_uuid='$file_src_uuid'") || die $DBI::errstr;
$file_backup=$DBreq->execute();
@file_backup=$DBreq->fetchrow_array();
$file_backup=(at)file_backup[0];
$file_restore=(at)file_backup[1];
$file_display=(at)file_backup[2];

# Jump into the re-entrant subroutine to scan directories and sub-dirs
&list_files($real_dir, $exclude_list_num, $relative_dir, $file_backup,
$file_restore, $file_display);

# Inside the sub routine

# Does the directory/file/symlink already exist? (there are three of
these for each file type)
$num = $select_sth->execute($file_src_uuid,$relative_dir,$file_name) ||
die "$DBI::errstr";
if ( $num > 0 )
{

$select_up->execute($file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_exist,$file_src_uuid,$file_parent_dir,$file_name)
|| die "$DBI::errstr";
}
else
{
# The file did not exist so we will use the passed parent settings
for the 'file_backup' flag and leave the 'file_display' flag set to 'f'

$select_in->execute($file_src_uuid,$file_name,$file_dir,$file_parent_dir,$file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_backup,$file_restore,$file_display,$file_exist)
|| die "$DBI::errstr";
}

# If this was a file I would loop and process the next file in the
directory, if it was a directory itself I would now re-enter the
subroutine to process it's contents and when I fell back I would pick up
where I left off

# Returning from the final subroutine and finishing up

$DB->do("VACUUM ANALYZE");

=-[ finished DB related source code ]-=

Here is the schema for the 'file_dir' table which I hit repeatedly here:

=-[ file_dir table and index schemas ]-=

CREATE TABLE file_dir ( -- Used to store info
on every file on source partitions
file_id serial unique, -- make
this 'bigserial' if there may be more than 2 billion files in the database
file_src_uuid varchar(40) not null, -- the
UUID of the source partition hosting the original file
file_org_uuid varchar(40), -- the UUID
that the file came from (when the file was moved by TLE-BU)
file_name varchar(255) not null, -- Name of
the file or directory
file_dir bool not null, -- t = is
directory, f = file
file_parent_dir varchar(255) not null, -- if
directory '/foo/bar', parent is '/foo', if file '/foo/bar/file', parent
is '/foo/bar'. The mount directory is treated as '/' so any directories
below it will be ignored for this record.
file_perm varchar(10) not null, -- file or
directory permissions
file_own_user varchar(255) not null, -- The
file's owning user (by name, not UID!!)
file_own_grp varchar(255) not null, -- The
file's owning group (by name, not GID!!)
file_size bigint not null, -- File size in
bytes
file_mod_date varchar(12) not null, -- File's
last edited date
file_mod_time varchar(20) not null, -- File's
last edited time
file_mod_time_zone varchar(6) not null, -- File's
last edited time zone
file_backup boolean not null default 'f', --
't' = Include in backup jobs, 'f' = Do not include in backup jobs
file_restore boolean not null default 'f', --
't' = Include in restore jobs, 'f' = Do not include in restore jobs
file_display boolean not null default 'f', --
't' = display, 'f' = hide
file_exist boolean default 't' -- Used to
catch files that have been deleted since the last scan. Before rescan,
all files in a given src_uuid are set to 0 (deleted) and then as each
file is found or updated it is reset back to 1 (exists) and anything
left with a value of '0' at the end of the scan is deleted and we will
remove their record.
);

-- CREATE INDEX file_dir_idx ON file_dir
(file_src_uuid,file_name,file_parent_dir);

=-[ Finish file_dir table and index schemas ]-=

Thanks so much!!

Madison

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sailer, Denis (YBUSA-CDR) 2004-06-21 15:49:22 JDBC driver inserting into a table with Bytea type gets out of me mory error.
Previous Message Madison Kelly 2004-06-21 15:34:25 Re: New to the list; would this be an okay question?