BUG #6200: standby bad memory allocations on SELECT

From: "Daniel Farina" <daniel(at)heroku(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #6200: standby bad memory allocations on SELECT
Date: 2011-09-08 22:33:37
Message-ID: 201109082233.p88MXbGE026996@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 6200
Logged by: Daniel Farina
Email address: daniel(at)heroku(dot)com
PostgreSQL version: 9.0.4
Operating system: Ubuntu 10.04
Description: standby bad memory allocations on SELECT
Details:

A huge thanks to Conrad Irwin of Rapportive for furnishing virtually all the
details of this bug report.

The following error occurs up to a couple of times a day on a busy
PostgreSQL database:

ERROR: invalid memory alloc request size 18446744073709551613

The occurrence rate is somewhere in the one per tens-of-millions of
queries.

The number is always the same (2**64 - 3), and there's no obvious
pattern in the distribution of errors (they don't even appear to be
correlated with system load). The error has not been recorded on the
primary database, even though the same workload is submitted.

These errors do not reproduce, seeming to evaporate almost immediately on
the standby, so durable/long lasting index corruption is not likely. This
problem has persisted among multiple generations of hot standbys on
different hardware and sourced from different base backups.

At least once, a hot standby was promoted to a primary and the errors seem
to discontinue, but then reappear on a newly-provisioned standby.

The VERSION() string is:
PostgreSQL 9.0.4 on x86_64-pc-linux-gnu, compiled by GCC
gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit

The problem is confined to a particular access patterns and schema objects,
enumerated below:

The points2 table looks like:

Table "public.points2"
Column | Type |
Modifiers
------------------+-----------------------------+---------------------------
---------------------------
id | integer | not null default nextval('points2_id_seq'::regclass)
identifier | text | not null
scope_id | integer | not null
class_number | smallint | not null
authorization_id | integer | not null
sum_json | text | not null
amended_at | timestamp without time zone | not null
Indexes:
"points2_pkey" PRIMARY KEY, btree (id)
"points2_special" btree (identifier_hash(identifier), scope_id,
class_number, authorization_id)

CREATE FUNCTION identifier_hash(text) RETURNS bigint IMMUTABLE
LANGUAGE SQL AS $$
SELECT ('x' || md5($1))::bit(64)::bigint;
$$;

This has only been seen on queries of the form:

SELECT * FROM "points2" WHERE
(identifier_hash(identifier) = identifier_hash('1104131405')
AND identifier = '1104131405'
AND scope_id = 0
AND authorization_id = 0
AND class_number = 25)

Though this table is accessed similarly frequently by queries of the form:

SELECT points2.* FROM points2
JOIN (VALUES (8585261297509044776, 0, 47,
'ae9064e6f24127c6a1f483cd71e14e64'))
AS query(hashed_identifier, scope_id, class_number, identifier)
ON identifier_hash(points2.identifier) = query.hashed_identifier
AND points2.scope_id = query.scope_id
AND points2.class_number = query.class_number
AND points2.identifier = query.identifier;

these do not trigger the problem.

The table is always updated to or inserted into one row at a time
(using the "id" primary key for updates), though we sometimes update
multiple rows in a single transaction, synchronous_commit is turned off for
connections that touch the points2 table on the primary.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2011-09-09 15:02:44 Re: BUG #6200: standby bad memory allocations on SELECT
Previous Message Peter Eisentraut 2011-09-08 10:09:22 Re: psql doesn't reuse -p after backend fail