Re: Index creation takes for ever

From: ohp(at)pyrenet(dot)fr
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index creation takes for ever
Date: 2003-09-01 11:57:11
Message-ID: Pine.UW2.4.53.0309011347420.23865@server.pyrenet.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom,
Let me come back on this one:
I initdb --locale=C and reloaded all bases including this one. Index
creation wassn't too bad. So I thought that was it.

Yesturday evening I decided to make a test so I pg_dump'd that database,
created a test db and reloaded evrything from the pg_dump.

it took 69 minutes to finish, 75% of this time was devoted to create 2
indexes on varchar(2) with value being 'O', 'N' or null;

I wonder if it's a configuration matter.
PGSQL is VERY solicited here (10's thousand connections/day , multiple
queries/connection

Here's my postgresql.conf FWIW, Any advice ?

#
# PostgreSQL configuration file
# -----------------------------
#
# This file consists of lines of the form:
#
# name = value
#
# (The '=' is optional.) White space may be used. Comments are introduced
# with '#' anywhere on a line. The complete list of option names and
# allowed values can be found in the PostgreSQL documentation. The
# commented-out settings shown in this file represent the default values.
#
# Any option can also be given as a command line switch to the
# postmaster, e.g. 'postmaster -c log_connections=on'. Some options
# can be changed at run-time with the 'SET' SQL command.
#
# This file is read on postmaster startup and when the postmaster
# receives a SIGHUP. If you edit the file on a running system, you have
# to SIGHUP the postmaster for the changes to take effect, or use
# "pg_ctl reload".

#========================================================================

#
# Connection Parameters
#
tcpip_socket = true
#ssl = false

max_connections = 64
#superuser_reserved_connections = 2

port = 5432
hostname_lookup = true
#show_source_port = false

#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777 # octal

#virtual_host = ''

#krb_server_keyfile = ''

#
# Shared Memory Size
#
shared_buffers = 10000 # min max_connections*2 or 16, 8KB each
max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes
max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 bytes
#max_locks_per_transaction = 64 # min 10
#wal_buffers = 8 # min 4, typically 8KB each

#
# Non-shared Memory Sizes
#
sort_mem = 10240 # min 64, size in KB
#vacuum_mem = 8192 # min 1024, size in KB

#
# Write-ahead log (WAL)
#
#checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300 # range 30-3600, in seconds
#
#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000
#
#fsync = true
#wal_sync_method = fsync # the default varies across platforms:
# # fsync, fdatasync, open_sync, or open_datasync
#wal_debug = 0 # range 0-16

#
# Optimizer Parameters
#
#enable_seqscan = true
#enable_indexscan = true
#enable_tidscan = true
#enable_sort = true
#enable_nestloop = true
#enable_mergejoin = true
#enable_hashjoin = true

effective_cache_size = 10000 # typically 8KB each
#random_page_cost = 4 # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01 # (same)
#cpu_index_tuple_cost = 0.001 # (same)
#cpu_operator_cost = 0.0025 # (same)

#default_statistics_target = 10 # range 1-1000

#
# GEQO Optimizer Parameters
#
#geqo = true
#geqo_selection_bias = 2.0 # range 1.5-2.0
#geqo_threshold = 11
#geqo_pool_size = 0 # default based on tables in statement,
# range 128-1024
#geqo_effort = 1
#geqo_generations = 0
#geqo_random_seed = -1 # auto-compute seed

#
# Message display
#
#server_min_messages = notice # Values, in order of decreasing detail:
# debug5, debug4, debug3, debug2, debug1,
# info, notice, warning, error, log, fatal,
# panic
#client_min_messages = notice # Values, in order of decreasing detail:
# debug5, debug4, debug3, debug2, debug1,
# log, info, notice, warning, error
#silent_mode = false

log_connections = true
log_pid = true
log_statement = true
log_duration = true
#log_timestamp = false

#log_min_error_statement = error # Values in order of increasing severity:
# debug5, debug4, debug3, debug2, debug1,
# info, notice, warning, error, panic(off)

#debug_print_parse = false
#debug_print_rewritten = false
#debug_print_plan = false
debug_pretty_print = false

#explain_pretty_print = true

# requires USE_ASSERT_CHECKING
#debug_assertions = true

#
# Syslog
#
syslog = 2 # range 0-2
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'

#
# Statistics
#
#show_parser_stats = false
#show_planner_stats = false
#show_executor_stats = false
#show_statement_stats = false

# requires BTREE_BUILD_STATS
#show_btree_build_stats = false

#
# Access statistics collection
#
#stats_start_collector = true
#stats_reset_on_server_start = true
stats_command_string = true
stats_row_level = true
#stats_block_level = false

#
# Lock Tracing
#
#trace_notify = false

# requires LOCK_DEBUG
#trace_locks = false
#trace_userlocks = false
#trace_lwlocks = false
#debug_deadlocks = false
#trace_lock_oidmin = 16384
#trace_lock_table = 0

#
# Misc
#
#autocommit = true
#dynamic_library_path = '$libdir'
#search_path = '$user,public'
datestyle = 'postgres, european'
#timezone = unknown # actually, defaults to TZ environment setting
#australian_timezones = false
#client_encoding = sql_ascii # actually, defaults to database encoding
#authentication_timeout = 60 # 1-600, in seconds
#deadlock_timeout = 1000 # in milliseconds
#default_transaction_isolation = 'read committed'
#max_expr_depth = 10000 # min 10
#max_files_per_process = 1000 # min 25
#password_encryption = true
#sql_inheritance = true
#transform_null_equals = false
#statement_timeout = 0 # 0 is disabled, in milliseconds
#db_user_namespace = false

#
# Local Settings
LC_MESSAGES = 'fr_FR'
LC_MONETARY = 'fr_FR'

The machine has 1 G DDR ECC Registred RAM, 2 1,8 GZ XEON running uw713
PGversion is 7.3.4, macine is not swaping.

Those index creation took 100% of 1 CPU.

Regards
On Thu, 28 Aug 2003, Tom Lane wrote:

> Date: Thu, 28 Aug 2003 10:13:21 -0400
> From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> To: ohp(at)pyrenet(dot)fr
> Cc: pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
> Subject: Re: [HACKERS] Index creation takes for ever
>
> ohp(at)pyrenet(dot)fr writes:
> > I've then pg_dump'ed the database and recreate an other both on 7.3.4 and
> > 7.4b
>
> > Both are still running after more than 30 minutes of CPU (100% cpu taken)
> > creating the levt_lu_ligne_evt_key.
>
> That's hard to believe. I get
>
> regression=# SELECT levt_lu,count(*) from ligne_evt group by levt_lu;
> levt_lu | count
> ---------+--------
> N | 49435
> O | 181242
> (2 rows)
>
> Time: 6927.28 ms
> regression=# create index levt_lu_ligne_evt_key on ligne_evt (levt_lu);
> CREATE INDEX
> Time: 14946.74 ms
>
> on a not-very-fast machine ... and it seems to be mostly I/O bound.
>
> What platform are you on? I could believe that the local qsort() is
> incredibly inefficient with many equal keys. Another possibility is
> that you're using a non-C locale and strcoll() is horribly slow.
>
> regards, tom lane
>

--
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax)
31190 AUTERIVE +33-6-07-63-80-64 (GSM)
FRANCE Email: ohp(at)pyrenet(dot)fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message ohp 2003-09-01 12:11:34 Re: pg_dump bug?
Previous Message Mark Kirkwood 2003-09-01 11:00:16 Re: Is it a memory leak in PostgreSQL 7.4beta?

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2003-09-01 12:46:09 Re: Index creation takes for ever
Previous Message Bruce Momjian 2003-08-31 23:38:54 Re: libpq-win32 patches