performance problem

From: sanjay tumula <stumula(at)comcast(dot)net>
To: pgsql-admin(at)postgresql(dot)org
Subject: performance problem
Date: 2005-04-12 17:04:48
Message-ID: 425BFFB0.8060205@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

i am having poor performance with postgres 8.0.0

the config of my machine is:

single xeon 3.4 Ghz, 1GB of memory, two IDE drives mounted using LVM and
JFS file system on Suse 9.2 professional.

What I am trying to do is simple...atleast in ORACLE or DB2 that I am
trying to stay away from.

I have bunch of records that i need to load...about 30mil every month
for last 18months. each month is in four parts of about 7 mil.

so i have a stage table..that i load and check to see if there are any
dups and load into prod tables...

i tried several ways...all using pl/Pgsql...details are below....

what is it that i am doing wrong? with method one...it finishes in
around 8hrs for 7.5 mil records....

with method 2...its processing 5000 records in 5 mins...but it increases
with each pass... first pass < 1min, second pass 2.5mins..and so on...

any input is greatly appreciated...other than i need to upgrade the
machine...which i am in the process of.

1. tried to reduce the 7mil, into 'manageable chunks' of about 500,000,
100,000 or even 5000.....
for commiting between...to keep transactions small.

a sample SQL that i use...
/************************************************
copy existing records with updates to history
************************************************/
INSERT INTO sms.sms_src_history
(
smsshist_id_n
,smsshist_phone_s
,status_cd_s
,smsshist_effective_date_s
,smsshist_effective_time_s
,resporg_cd_s
,dataset_cd_s
,smsshist_phone_n
,smsshist_npa_n
,smsshist_nxx_n
,smsshist_line_n
,smsshist_create_dt
,smsshist_create_user_s
,smsshist_effective_dt
)
SELECT
nextval('sms.sms_src_history_seq')
,smsscurr_phone_s
,status_cd_s
,smsscurr_effective_date_s
,smsscurr_effective_time_s
,resporg_cd_s
,dataset_cd_s
,smsscurr_phone_n
,smsscurr_npa_n
,smsscurr_nxx_n
,smsscurr_line_n
,LOCALTIMESTAMP
,'PRCURR_LOAD'
,smsscurr_effective_dt
FROM
sms.sms_src_current
WHERE
smsscurr_phone_s
IN
(
SELECT
phone_s
FROM
(
SELECT
trim(phone_s) as phone_s
,trim(status_cd_s) as status_cd_s
,trim(effective_date_s) as effective_date_s
,trim(effective_time_s) as effective_time_s
,trim(resporg_cd_s) as resporg_cd_s
FROM
sms_stage.stg_sms_src_current
WHERE
sssc_id_n between begin_rec and end_rec
EXCEPT
SELECT
smsscurr_phone_s,status_cd_s,smsscurr_effective_date_s
,smsscurr_effective_time_s,resporg_cd_s
FROM
sms.sms_src_current
) AS in_values
EXCEPT
(
SELECT trim(phone_s)
FROM sms_stage.stg_sms_src_current
WHERE sssc_id_n between begin_rec and end_rec
EXCEPT
SELECT smsscurr_phone_s
FROM sms.sms_src_current
)
)

2. declare a cursor and go thru records one by one...this one i tried
only 5000 at a time....still VERY slow

stg_curs CURSOR FOR
SELECT
sssc_id_n
,trim(phone_s) as phone_s
,trim(status_cd_s) as status_cd_s
,trim(effective_date_s) as effective_date_s
,trim(effective_time_s) as effective_time_s
,trim(resporg_cd_s) as resporg_cd_s
,trim(dataset_cd_s) as dataset_cd_s
FROM
sms_stage.stg_sms_src_current
WHERE
sssc_id_n between begin_rec and end_rec

prod_curs CURSOR FOR
SELECT *
FROM sms.sms_src_current
WHERE smsscurr_phone_n = to_number(substr(stg_rec.phone_s,1,3)
||substr(stg_rec.phone_s,5,3)
||substr(stg_rec.phone_s,9,4),'9999999999')
ORDER BY smsscurr_id_n DESC

etc etc...

3. the portion of the shell script that call the above function is
echo "Getting Record Count At: "`date`
recnum=`psql -t -c "SELECT COALESCE(MAX(sssc_id_n),0) from
sms_stage.stg_sms_src_current " sms`
echo "Done Getting Record Count At: "`date`

if [ $recnum -eq 0 ]
then
echo "No Source Records To Process In Stage" else
echo "Processing Source Data Started At: "`date`
for (( begin_rec = 1,end_rec = increment
;end_rec < recnum
;begin_rec = end_rec + 1,end_rec = end_rec + increment
))
do
echo "----------------------------------------------------"
echo "Calling process_src_data At: "`date`
echo "With"
echo "begin_rec: "$begin_rec
echo "end_rec: "$end_rec
psql -e -v precs=$begin_rec,$end_rec -f pg_process_src_data.sh sms
echo "Completed process_src_data At: "`date`

etc....

4. the following is my conf file

start of conf file:
#-------------------------------------------------------------------------------
external_pid_file = 'postmaster' # write an extra pid file

#---------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#---------------------------------------------------------------------------

# - Connection Settings -

#listen_addresses = 'localhost' # what IP interface(s) to listen on;
# defaults to localhost, '*' = any
listen_addresses = '*' # what IP interface(s) to listen on;
#port = 5432
max_connections = 200
# note: increasing max_connections costs about 500 bytes of shared
# memory per connection slot, in addition to costs from
shared_buffers
# and max_locks_per_transaction.
#superuser_reserved_connections = 2
superuser_reserved_connections = 15
#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777 # octal
unix_socket_permissions = 0700 # octal
#rendezvous_name = '' # defaults to the computer name

# - Security & Authentication -

#authentication_timeout = 60 # 1-600, in seconds
#ssl = false
ssl = true
#password_encryption = true
#krb_server_keyfile = ''
#db_user_namespace = false

#---------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#---------------------------------------------------------------------------

# - Memory -

shared_buffers = 38400 # min 16, at least max_connections*2,
8KB each
work_mem = 153600 # min 64, size in KB
maintenance_work_mem = 102400 # min 1024, size in KB
max_stack_depth = 10240 # min 100, size in KB

# - Free Space Map -

max_fsm_pages = 200000 # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 10000 # min 100, ~50 bytes each

# - Kernel Resource Usage -

#max_files_per_process = 1000 # min 25
preload_libraries = '$libdir/plpgsql:plpgsql_init'

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0 # 0-1000 milliseconds
#vacuum_cost_page_hit = 1 # 0-10000 credits
#vacuum_cost_page_miss = 10 # 0-10000 credits
#vacuum_cost_page_dirty = 20 # 0-10000 credits
#vacuum_cost_limit = 200 # 0-10000 credits

# - Background writer -

bgwriter_delay = 1000 # 10-10000 milliseconds between rounds
bgwriter_percent = 5 # 0-100% of dirty buffers in each round
#bgwriter_maxpages = 100 # 0-1000 buffers max per round

#---------------------------------------------------------------------------
# WRITE AHEAD LOG
#---------------------------------------------------------------------------

# - Settings -

fsync = false # turns forced synchronization on or off
#wal_sync_method = fsync # the default varies across platforms:
# fsync, fdatasync, open_sync, or
open_datasync
wal_buffers = 40 # min 4, 8KB each
#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000

# - Checkpoints -

checkpoint_segments = 10 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 600 # range 30-3600, in seconds
#checkpoint_warning = 30 # 0 is off, in seconds

# - Archiving -

#archive_command = '' # command to use to archive a logfile
segment
archive_command = 'test ! -f /data/archivelog/"%f" && cp "%p"
/data/archivelog/"%f"'

#---------------------------------------------------------------------------
# QUERY TUNING
#---------------------------------------------------------------------------

# - Planner Method Configuration -

#enable_hashagg = true
#enable_hashjoin = true
#enable_indexscan = true
#enable_mergejoin = true
#enable_nestloop = true
#enable_seqscan = true
#enable_sort = true
#enable_tidscan = true

# - Planner Cost Constants -

effective_cache_size = 3000 # typically 8KB each
#random_page_cost = 4 # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01 # (same)
#cpu_index_tuple_cost = 0.001 # (same)
#cpu_operator_cost = 0.0025 # (same)

# - Genetic Query Optimizer -

#geqo = true
#geqo_threshold = 12
#geqo_effort = 5 # range 1-10
#geqo_pool_size = 0 # selects default based on effort
#geqo_generations = 0 # selects default based on effort
#geqo_selection_bias = 2.0 # range 1.5-2.0

# - Other Planner Options -

#default_statistics_target = 10 # range 1-1000
#from_collapse_limit = 8
#join_collapse_limit = 8 # 1 disables collapsing of explicit JOINs

#---------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#---------------------------------------------------------------------------

# - Where to Log -

#log_destination = 'stderr' # Valid values are combinations of stderr,
# syslog and eventlog, depending on
# platform.

# This is relevant when logging to stderr:
redirect_stderr = true # Enable capturing of stderr into log files.
# These are only relevant if redirect_stderr is true:
log_directory = 'pg_log' # Directory where log files are written.
# May be specified absolute or relative to
PGDATA
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # Log file name pattern.
# May include strftime() escapes
log_truncate_on_rotation = false # If true, any existing log file of the
# same name as the new log file will be
truncated
# rather than appended to. But such truncation
# only occurs on time-driven rotation,
# not on restarts or size-driven rotation.
# Default is false, meaning append to existing
# files in all cases.
log_rotation_age = 14400 # Automatic rotation of logfiles will happen
after
# so many minutes. 0 to disable.
#log_rotation_size = 10240 # Automatic rotation of logfiles will happen
after
# so many kilobytes of log output. 0 to
disable.

# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'

# - When to Log -

#client_min_messages = notice # Values, in order of decreasing detail:
# debug5, debug4, debug3, debug2, debug1,
# log, notice, warning, error

#log_min_messages = notice # Values, in order of decreasing detail:
# debug5, debug4, debug3, debug2, debug1,
# info, notice, warning, error, log,
fatal,
# panic

#log_error_verbosity = default # terse, default, or verbose messages

#log_min_error_statement = panic # Values in order of increasing severity:
# debug5, debug4, debug3, debug2, debug1,
# info, notice, warning, error,
panic(off)

log_min_duration_statement = 2147483 # -1 is disabled, in milliseconds.

silent_mode = true # DO NOT USE without syslog or
redirect_stderr

# - What to Log -

#debug_print_parse = false
#debug_print_rewritten = false
#debug_print_plan = false
#debug_pretty_print = false
#log_connections = false
#log_disconnections = false
#log_duration = false
#log_line_prefix = '%t' # e.g. '<%u%%%d> '
# %u=user name %d=database name
# %r=remote host and port
# %p=PID %t=timestamp %i=command tag
# %c=session id %l=session line number
# %s=session start timestamp
%x=transaction id
# %q=stop here in non-session processes
# %%='%'
#log_statement = 'none' # none, mod, ddl, all
#log_hostname = false

#---------------------------------------------------------------------------
# RUNTIME STATISTICS
#---------------------------------------------------------------------------

# - Statistics Monitoring -

#log_parser_stats = false
#log_planner_stats = false
#log_executor_stats = false
log_statement_stats = true

# - Query/Index Statistics Collector -

#stats_start_collector = true
stats_command_string = true
#stats_block_level = false
#stats_row_level = false
#stats_reset_on_server_start = true

#---------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#---------------------------------------------------------------------------

# - Statement Behavior -

#search_path = '$user,public' # schema names
#default_tablespace = '' # a tablespace name, or '' for default
#check_function_bodies = true
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = false
#statement_timeout = 0 # 0 is disabled, in milliseconds

# - Locale and Formatting -

#datestyle = 'iso, mdy'
#timezone = unknown # actually, defaults to TZ environment
setting
#australian_timezones = false
#extra_float_digits = 0 # min -15, max 2
#client_encoding = sql_ascii # actually, defaults to database encoding

# These settings are initialized by initdb -- they might be changed
lc_messages = 'en_US.UTF-8' # locale for system error
message strings
lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
lc_numeric = 'en_US.UTF-8' # locale for number formatting
lc_time = 'en_US.UTF-8' # locale for time formatting

# - Other Defaults -

#explain_pretty_print = true
#dynamic_library_path = '$libdir'

#---------------------------------------------------------------------------
# LOCK MANAGEMENT
#---------------------------------------------------------------------------

deadlock_timeout = 30000 # in milliseconds
#max_locks_per_transaction = 64 # min 10, ~200*max_connections bytes each

#---------------------------------------------------------------------------
# VERSION/PLATFORM COMPATIBILITY
#---------------------------------------------------------------------------

# - Previous Postgres Versions -

add_missing_from = false
#regex_flavor = advanced # advanced, extended, or basic
#sql_inheritance = true
default_with_oids = false

# - Other Platforms & Clients -

#transform_null_equals = false

# END OF CONF FILE

Browse pgsql-admin by date

  From Date Subject
Next Message Simon Riggs 2005-04-12 18:16:51 Re: PG8 Timelines
Previous Message Joshua D. Drake 2005-04-12 16:39:19 Re: PostgreSQL replication solution?