Getting an out of memory failure.... (long email)

From: Sean Shanny <shannyconsulting(at)earthlink(dot)net>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Getting an out of memory failure.... (long email)
Date: 2004-09-28 03:21:58
Message-ID: BD7E5116.18FFB%shannyconsulting@earthlink.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

To all,

Running into an out of memory error on our data warehouse server. This
occurs only with our data from the 'September' section of a large fact
table. The exact same query running over data from August or any prior
month for that matter works fine which is why this is so weird. Note that
June 2004 through today is stored in the same f_pageviews table.

Nothing has changed on the server in the last couple of months. I upgraded
to 7.4.5 from 7.4.3 today thinking maybe that would solve it by no joy.

Configuration and details: (Our postgresql.conf file is at the end of this
email)

Dell 2650 with 2 CPU's, 4GB ram running Linux 2.6.5-1.358smp hyper-threading
turned off. We are attached to a fully configured Apple X-Raid system
running in a RAID50 configuration (2.8TB formatted). The file system is
ext2. The warehouse is using about 800GB with indexes.

Thanks.

--sean

We are looking at these two tables:

Table "public.f_pageviews"
Column | Type | Modifiers
------------------------+---------+----------------------------------------
id | integer | not null
date_key | integer | not null
time_key | integer | not null
content_key | integer | not null
location_key | integer | not null
session_key | integer | not null
persistent_cookie_key | integer | not null
ip_key | integer | not null
referral_key | integer | not null
servlet_key | integer | not null
tracking_key | integer | not null
provider_key | text | not null
marketing_campaign_key | integer | not null
orig_airport | text | not null
dest_airport | text | not null
commerce_page | boolean | not null default false
job_control_number | integer | not null
sequenceid | integer | not null default 0
url_key | integer | not null
useragent_key | integer | not null
web_server_name | text | not null default 'Not Available'::text
cpc | integer | not null default 0
referring_servlet_key | integer | not null default 1
first_page_key | integer | not null default 1
newsletterid_key | text | not null default 'Not Available'::text
userid_key | integer | not null
pool | text | default 'Not Available'::text
cpm | integer | default 0
vendor_key | integer | default 1
teaser_key | integer | default 1
query_key | integer | default 1
guid | text | default 'NA'::text
Indexes:
"idx_pageviews_primary" unique, btree (id, date_key)
"idx_pageviews_date" btree (date_key)
"idx_pageviews_session" btree (session_key)

82+ million records in the September section we are looking at, about
425,677,597 records total in the table.

This table is created each time we run the report.

Table "public.addloc_segmented_sub"
Column | Type | Modifiers
--------+---------+-----------
userid | integer |
subage | integer |
Indexes:
"idx_addloc_segmented_sub" btree (userid)

60605 records

We attempt to issue this query: (we force a table scan as it much faster
then using the date_key index due to the size of the table)

Set enable_indexscan = false;

SELECT subage, COUNT( DISTINCT (t1.session_key)), COUNT(
DISTINCT(t1.userid_key)) FROM f_pageviews t1
JOIN addloc_segmented_sub t0 ON (t1.userid_key = t0.userid) WHERE
t1.date_key BETWEEN 610 AND 631
GROUP BY 1

Explain:

GroupAggregate (cost=11764067.79..11764067.82 rows=2 width=12)
-> Sort (cost=11764067.79..11764067.80 rows=2 width=12)
Sort Key: t0.subage
-> Hash Join (cost=11762857.88..11764067.78 rows=2 width=12)
Hash Cond: ("outer".userid = "inner".userid_key)
-> Seq Scan on addloc_segmented_sub t0 (cost=0.00..905.92
rows=60792 width=8)
-> Hash (cost=11762857.88..11762857.88 rows=1 width=8)
-> Seq Scan on f_pageviews t1 (cost=0.00..11762857.88
rows=1 width=8)
Filter: ((date_key >= 610) AND (date_key <= 631))

I cannot run an explain analyze, it dies.

Whether I run it from the java based report program or from psql I get the
same out of memory error. This is the java based version:

> ERROR [WAREHOUSE] <20:36:09> org.postgresql.util.PSQLException: ERROR: out of
> memory
>
> at
> org.postgresql.util.PSQLException.parseServerError(PSQLException.java:139)
> at org.postgresql.core.QueryExecutor.executeV3(QueryExecutor.java:154)
> at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:101)
> at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:43)
> at
> org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.jav
> a:515)
> at
> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.jav
> a:50)
> at
> org.postgresql.jdbc1.AbstractJdbc1Statement.executeQuery(AbstractJdbc1Statemen
> t.java:231)
> at
> com.TripResearch.warehouse.reports.AddLocUserCreationAgeActivityReport._proces
> s(AddLocUserCreationAgeActivityReport.java:523)
> at
> com.TripResearch.warehouse.reports.AddLocUserCreationAgeActivityReport.initiat
> eDataGathering(AddLocUserCreationAgeActivityReport.java:444)
> at
> com.TripResearch.warehouse.reports.AddLocUserCreationAgeActivityReport.main(Ad
> dLocUserCreationAgeActivityReport.java:140)

Here is the psql error message:

> tripmaster=# set enable_indexscan = false;
> SET
> tripmaster=# SELECT subage, COUNT( DISTINCT (t1.session_key)), COUNT(
> DISTINCT(t1.userid_key)) FROM f_pageviews t1
> tripmaster-# JOIN addloc_segmented_sub t0 ON (t1.userid_key = t0.userid) WHERE
> t1.date_key BETWEEN 610 AND 631
> tripmaster-# GROUP BY 1;
> ERROR: out of memory
> DETAIL: Failed on request of size 60.

The server side error message is below:

TopMemoryContext: 32768 total in 3 blocks; 4472 free (5 chunks); 28296 used
TopTransactionContext: 8192 total in 1 blocks; 8136 free (0 chunks); 56 used
DeferredTriggerXact: 0 total in 0 blocks; 0 free (0 chunks); 0 used
MessageContext: 57344 total in 3 blocks; 22376 free (0 chunks); 34968 used
PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
ExecutorState: 24576 total in 2 blocks; 12480 free (2 chunks); 12096 used
HashTableContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
HashBatchContext: -119689104 total in 512 blocks; 8416 free (18 chunks);
-119697520 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
AggContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
CacheMemoryContext: 516096 total in 6 blocks; 33600 free (3 chunks); 482496
used
idx_addloc_segmented_sub: 1024 total in 1 blocks; 640 free (0 chunks); 384
used
idx_addloc_users_date: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_temp_2930752333: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_description_o_c_o_index: 2048 total in 1 blocks; 768 free (0 chunks);
1280 used
pg_depend_depender_index: 2048 total in 1 blocks; 768 free (0 chunks); 1280
used
pg_depend_reference_index: 2048 total in 1 blocks; 768 free (0 chunks); 1280
used
d_date_pkey: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
idx_date_5: 1024 total in 1 blocks; 320 free (0 chunks); 704 used
idx_date_4: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
idx_date_3: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
idx_date_2: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
idx_date_1: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_index_indrelid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384
used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 320 free (0 chunks);
704 used
pg_amop_opc_strategy_index: 1024 total in 1 blocks; 320 free (0 chunks); 704
used
pg_shadow_usename_index: 1024 total in 1 blocks; 640 free (0 chunks); 384
used
pg_amop_opr_opc_index: 1024 total in 1 blocks; 320 free (0 chunks); 704 used
pg_conversion_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384
used
pg_language_name_index: 1024 total in 1 blocks; 640 free (0 chunks); 384
used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 320 free (0 chunks);
704 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 320 free (0
chunks); 704 used
pg_shadow_usesysid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384
used
pg_cast_source_target_index: 1024 total in 1 blocks; 320 free (0 chunks);
704 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 320 free (0 chunks);
704 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 320 free (0
chunks); 704 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 640 free (0 chunks); 384
used
pg_conversion_default_index: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 320 free (0 chunks); 704
used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384
used
pg_inherits_relid_seqno_index: 1024 total in 1 blocks; 320 free (0 chunks);
704 used
pg_language_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 320 free (0 chunks); 704
used
pg_group_sysid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_namespace_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384
used
pg_proc_proname_args_nsp_index: 2048 total in 1 blocks; 704 free (0 chunks);
1344 used
pg_opclass_am_name_nsp_index: 2048 total in 1 blocks; 768 free (0 chunks);
1280 used
pg_group_name_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_proc_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_operator_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_amproc_opc_procnum_index: 1024 total in 1 blocks; 320 free (0 chunks);
704 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384
used
pg_operator_oprname_l_r_n_index: 2048 total in 1 blocks; 704 free (0
chunks); 1344 used
pg_opclass_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 320 free (0 chunks);
704 used
pg_type_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 320 free (0
chunks); 704 used
pg_class_oid_index: 1024 total in 1 blocks; 640 free (0 chunks); 384 used
MdSmgr: 8192 total in 1 blocks; 5760 free (0 chunks); 2432 used
DynaHash: 8192 total in 1 blocks; 6912 free (0 chunks); 1280 used
DynaHashTable: 8192 total in 1 blocks; 2008 free (0 chunks); 6184 used
DynaHashTable: 8192 total in 1 blocks; 5080 free (0 chunks); 3112 used
DynaHashTable: 8192 total in 1 blocks; 2008 free (0 chunks); 6184 used
DynaHashTable: 8192 total in 1 blocks; 1984 free (0 chunks); 6208 used
DynaHashTable: 8192 total in 1 blocks; 3520 free (0 chunks); 4672 used
DynaHashTable: 24576 total in 2 blocks; 13240 free (4 chunks); 11336 used
DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
DynaHashTable: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used
2004-09-27 20:46:56 ERROR: out of memory
DETAIL: Failed on request of size 60.

Our configuration file:

# -----------------------------
# PostgreSQL configuration file
# -----------------------------
#
# This file consists of lines of the form:
#
# name = value
#
# (The '=' is optional.) White space may be used. Comments are introduced
# with '#' anywhere on a line. The complete list of option names and
# allowed values can be found in the PostgreSQL documentation. The
# commented-out settings shown in this file represent the default values.
#
# Any option can also be given as a command line switch to the
# postmaster, e.g. 'postmaster -c log_connections=on'. Some options
# can be changed at run-time with the 'SET' SQL command.
#
# This file is read on postmaster startup and when the postmaster
# receives a SIGHUP. If you edit the file on a running system, you have
# to SIGHUP the postmaster for the changes to take effect, or use
# "pg_ctl reload".

#---------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#---------------------------------------------------------------------------

# - Connection Settings -

tcpip_socket = true
max_connections = 100
# note: increasing max_connections costs about 500 bytes of shared
# memory per connection slot, in addition to costs from
shared_buffers
# and max_locks_per_transaction.
#superuser_reserved_connections = 2
#port = 5432
#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777 # octal
#virtual_host = '' # what interface to listen on; defaults to
any
#rendezvous_name = '' # defaults to the computer name

# - Security & Authentication -

#authentication_timeout = 60 # 1-600, in seconds
#ssl = false
#password_encryption = true
#krb_server_keyfile = ''
#db_user_namespace = false

#---------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#---------------------------------------------------------------------------

# - Memory -

shared_buffers = 10000 # min 16, at least max_connections*2, 8KB
each
sort_mem = 65536 # min 64, size in KB
#vacuum_mem = 8192 # min 1024, size in KB

# - Free Space Map -

#max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000 # min 100, ~50 bytes each

# - Kernel Resource Usage -

#max_files_per_process = 1000 # min 25
#preload_libraries = ''

#---------------------------------------------------------------------------
# WRITE AHEAD LOG
#---------------------------------------------------------------------------

# - Settings -

fsync = true # turns forced synchronization on or off
wal_sync_method = fsync # the default varies across platforms:
# fsync, fdatasync, open_sync, or
open_datasync
wal_buffers = 64 # min 4, 8KB each

# - Checkpoints -

checkpoint_segments = 30 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 1800 # range 30-3600, in seconds
checkpoint_warning = 180 # 0 is off, in seconds
commit_delay = 50000 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000

#---------------------------------------------------------------------------
# QUERY TUNING
#---------------------------------------------------------------------------

# - Planner Method Enabling -

#enable_hashagg = true
#enable_hashjoin = true
#enable_indexscan = true
#enable_mergejoin = true
#enable_nestloop = true
#enable_seqscan = true
#enable_sort = true
#enable_tidscan = true

# - Planner Cost Constants -

effective_cache_size = 300000 # typically 8KB each
#random_page_cost = 4 # units are one sequential page fetch cost
#cpu_tuple_cost = 0.01 # (same)
#cpu_index_tuple_cost = 0.001 # (same)
#cpu_operator_cost = 0.0025 # (same)

# - Genetic Query Optimizer -

#geqo = true
#geqo_threshold = 11
#geqo_effort = 1
#geqo_generations = 0
#geqo_pool_size = 0 # default based on tables in statement,
# range 128-1024
#geqo_selection_bias = 2.0 # range 1.5-2.0

# - Other Planner Options -

#default_statistics_target = 10 # range 1-1000
#from_collapse_limit = 8
#join_collapse_limit = 8 # 1 disables collapsing of explicit JOINs

#---------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#---------------------------------------------------------------------------

# - Syslog -

#syslog = 0 # range 0-2; 0=stdout; 1=both; 2=syslog
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'

# - When to Log -

#client_min_messages = notice # Values, in order of decreasing detail:
# debug5, debug4, debug3, debug2, debug1,
# log, info, notice, warning, error

#log_min_messages = notice # Values, in order of decreasing detail:
# debug5, debug4, debug3, debug2, debug1,
# info, notice, warning, error, log,
fatal,
# panic

#log_error_verbosity = default # terse, default, or verbose messages

#log_min_error_statement = panic # Values in order of increasing severity:
# debug5, debug4, debug3, debug2, debug1,
# info, notice, warning, error,
panic(off)

#log_min_duration_statement = -1 # Log all statements whose
# execution time exceeds the value, in
# milliseconds. Zero prints all queries.
# Minus-one disables.

#silent_mode = false # DO NOT USE without Syslog!

# - What to Log -

#debug_print_parse = false
#debug_print_rewritten = false
#debug_print_plan = false
#debug_pretty_print = false
#log_connections = false
#log_duration = false
#log_pid = false
#log_statement = false
log_timestamp = true
#log_hostname = false
#log_source_port = false

#---------------------------------------------------------------------------
# RUNTIME STATISTICS
#---------------------------------------------------------------------------

# - Statistics Monitoring -

#log_parser_stats = false
#log_planner_stats = false
#log_executor_stats = false
#log_statement_stats = false

# - Query/Index Statistics Collector -

#stats_start_collector = true
stats_command_string = true
#stats_block_level = false
stats_row_level = true
#stats_reset_on_server_start = true

#---------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#---------------------------------------------------------------------------

# - Statement Behavior -

#search_path = '$user,public' # schema names
#check_function_bodies = true
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = false
#statement_timeout = 0 # 0 is disabled, in milliseconds

# - Locale and Formatting -

#datestyle = 'iso, mdy'
#timezone = unknown # actually, defaults to TZ environment
setting
#australian_timezones = false
#extra_float_digits = 0 # min -15, max 2
#client_encoding = sql_ascii # actually, defaults to database encoding

# These settings are initialized by initdb -- they may be changed
lc_messages = 'en_US.UTF-8' # locale for system error message
strings
lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
lc_numeric = 'en_US.UTF-8' # locale for number formatting
lc_time = 'en_US.UTF-8' # locale for time formatting

# - Other Defaults -

#explain_pretty_print = true
#dynamic_library_path = '$libdir'
#max_expr_depth = 10000 # min 10

#---------------------------------------------------------------------------
# LOCK MANAGEMENT
#---------------------------------------------------------------------------

#deadlock_timeout = 1000 # in milliseconds
#max_locks_per_transaction = 64 # min 10, ~260*max_connections bytes each

#---------------------------------------------------------------------------
# VERSION/PLATFORM COMPATIBILITY
#---------------------------------------------------------------------------

# - Previous Postgres Versions -

#add_missing_from = true
#regex_flavor = advanced # advanced, extended, or basic
#sql_inheritance = true

# - Other Platforms & Clients -

#transform_null_equals = false

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Fuhr 2004-09-28 03:35:12 Re: Foreign key order evaluation
Previous Message Matthew Hixson 2004-09-28 01:30:37 renaming sequences