Problems backing up

From: Sebastien Boisvert <sebastienboisvert(at)yahoo(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Problems backing up
Date: 2010-01-31 17:44:49
Message-ID: 188119.46868.qm@web34301.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,

We have an OS X app which integrates postgres as its database backend, and recently we've have a couple of cases where users haven't been able to perform a backup of their database. The failure gets reported as a problem in a table ("largedata") where we store large binary objects, with a bytea column. Normally when this happens, it's due to database corruption, and we delete/fix the affected data to repair the database. However, in these recent cases, it appears that database corruption is not the issue.

When we run into these backup problems we use a tool we've created that fetches all data from each row for that table to verify them; in the cases of corruption there's always 1 or more rows that can't be fetched, but in these cases all data can be fetch, so it doesn't appear to be a corruption issue.

The backup is done by usin pg_dump, and the logs shows the failure details below; prior to the failure I've seen real/virtual mem balloon past 1GB

While one of these problem database itself is very large (about 30gigs), the data in each row isn't too large (biggest object was 138871354 bytes, and our application can read it from the database just fine).

I've attached the details of the error(s) and pg/table configurations below. I'm hoping someone can point us in the right direction as to what to look for, as I'm not familiar enough with the intricacies and low-level implementation details of postgres to know what to look for or what additional information to look at to determine the cause.

Error during backup:
(31971) malloc: *** mmap(size=1073745920) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
TopMemoryContext: 129928 total in 15 blocks; 9600 free (19 chunks); 120328 used
Type information cache: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used
Operator class cache: 8192 total in 1 blocks; 3848 free (0 chunks); 4344 used
Operator lookup cache: 24576 total in 2 blocks; 14072 free (6 chunks); 10504 used
TopTransactionContext: 8192 total in 1 blocks; 7776 free (0 chunks); 416 used
MessageContext: 8192 total in 1 blocks; 5008 free (1 chunks); 3184 used
smgr relation table: 24576 total in 2 blocks; 11952 free (4 chunks); 12624 used
TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used
Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 536951860 total in 6 blocks; 11016 free (12 chunks); 536940844 used
COPY TO: 564491844 total in 3 blocks; 8120 free (5 chunks); 564483724 used
Relcache by OID: 24576 total in 2 blocks; 15584 free (3 chunks); 8992 used
CacheMemoryContext: 4337488 total in 23 blocks; 1934512 free (3 chunks); 2402976 used
pg_toast_30058_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_toast_29648_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_description_o_c_o_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used
pg_shdescription_o_c_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_database_datname_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_largeobject_loid_pn_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_trigger_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_trigger_tgconstrname_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_constraint_contypid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_constraint_conrelid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_constraint_conname_nsp_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_attrdef_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_cast_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_rewrite_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_inherits_relid_seqno_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_depend_reference_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used
pg_depend_depender_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_tablespace_spcname_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_tablespace_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_index_indrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_roles: 7168 total in 3 blocks; 2824 free (0 chunks); 4344 used
pg_ts_dict_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_language_name_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_ts_dict_dictname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_opfamily_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_opclass_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_ts_parser_prsname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_amop_fam_strat_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_cast_source_target_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_ts_config_cfgname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_authid_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_ts_config_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_conversion_default_index: 1024 total in 1 blocks; 128 free (0 chunks); 896 used
pg_language_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_enum_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
pg_ts_parser_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_database_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_class_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_amproc_fam_proc_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used
pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_type_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_authid_rolname_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_enum_typid_label_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_constraint_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_conversion_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
pg_ts_config_map_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used
pg_namespace_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_operator_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_amop_opr_fam_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used
pg_proc_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used
pg_ts_template_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
MdSmgr: 8192 total in 1 blocks; 1096 free (0 chunks); 7096 used
LOCALLOCK hash: 57344 total in 3 blocks; 33496 free (8 chunks); 23848 used
Timezones: 48616 total in 2 blocks; 5968 free (0 chunks); 42648 used
ErrorContext: 8192 total in 1 blocks; 8176 free (4 chunks); 16 used
2010-01-26 22:26:23.775 EST [PoetProductionsMain] - ERROR: out of memory
2010-01-26 22:26:23.781 EST [PoetProductionsMain] - DETAIL: Failed on request of size 1073741823.
2010-01-26 22:26:23.781 EST [PoetProductionsMain] - STATEMENT: COPY public.largedata (_timestamp, createdate, _rowid, rawdata, plaintext, isplaintext, storesplaintextrepresentation, deletionuserid, deletiondate, deletionidentifier) TO stdout;

PG Version: 8.3.5 (with a custom patch to deal with OS X's default shared mem value)

Platform: Mac OS X (10.5 & 10.6)

ulimit reports 'unlimited'

Config file has these values - tried different (bigger) values to no avail:
max_connections = 20# (change requires restart)
unix_socket_directory = '/tmp/DL-501'# (change requires restart)
unix_socket_permissions = 0700# begin with 0 to use octal notation
shared_buffers = 2400kB# min 128kB or max_connections*16kB
max_fsm_pages = 20000# min max_fsm_relations*16, 6 bytes each
max_files_per_process = 100# min 25
checkpoint_segments = 10# in logfile segments, min 1, 16MB each
checkpoint_timeout = 10min# range 30s-1h

__________________________________________________________________
Yahoo! Canada Toolbar: Search from anywhere on the web, and bookmark your favourite sites. Download it now
http://ca.toolbar.yahoo.com.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Steve Atkins 2010-01-31 17:56:13 Re: Best practice for file storage?
Previous Message Andy Colson 2010-01-31 15:02:06 Re: Best practice for file storage?