Re: Postgresql 8.4.1 segfault, backtrace

From: Richard Neill <rn214(at)hermes(dot)cam(dot)ac(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Brown <mbrown(at)fensystems(dot)co(dot)uk>, pgsql-bugs(at)postgreSQL(dot)org
Subject: Re: Postgresql 8.4.1 segfault, backtrace
Date: 2009-10-15 18:35:31
Message-ID: 4AD76B73.7020007@hermes.cam.ac.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Dear Tom,

Thanks for this, and sorry for not replying earlier. We finally obtained
a window to deploy this patch on the real (rather busy!) production
system as of last Saturday evening.

The good news is that the patch has now been in place for 5 days, and,
despite some very high loading, it has survived without a single crash.

I'd venture to say that this issue is now fixed.

Best wishes,

Richard

Tom Lane wrote:
> I wrote:
>> I'll get you a real fix as soon as I can, but might not be till
>> tomorrow.
>
> The attached patch (against 8.4.x) fixes the problem as far as I can
> tell. Please test.
>
> regards, tom lane
>
>
>
> ------------------------------------------------------------------------
>
> Index: src/backend/utils/cache/relcache.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/utils/cache/relcache.c,v
> retrieving revision 1.287
> diff -c -r1.287 relcache.c
> *** src/backend/utils/cache/relcache.c 11 Jun 2009 14:49:05 -0000 1.287
> --- src/backend/utils/cache/relcache.c 25 Sep 2009 17:32:02 -0000
> ***************
> *** 1386,1392 ****
> *
> * The data we insert here is pretty incomplete/bogus, but it'll serve to
> * get us launched. RelationCacheInitializePhase2() will read the real
> ! * data from pg_class and replace what we've done here.
> */
> relation->rd_rel = (Form_pg_class) palloc0(CLASS_TUPLE_SIZE);
>
> --- 1386,1394 ----
> *
> * The data we insert here is pretty incomplete/bogus, but it'll serve to
> * get us launched. RelationCacheInitializePhase2() will read the real
> ! * data from pg_class and replace what we've done here. Note in particular
> ! * that relowner is left as zero; this cues RelationCacheInitializePhase2
> ! * that the real data isn't there yet.
> */
> relation->rd_rel = (Form_pg_class) palloc0(CLASS_TUPLE_SIZE);
>
> ***************
> *** 2603,2619 ****
> * rows and replace the fake entries with them. Also, if any of the
> * relcache entries have rules or triggers, load that info the hard way
> * since it isn't recorded in the cache file.
> */
> hash_seq_init(&status, RelationIdCache);
>
> while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)
> {
> Relation relation = idhentry->reldesc;
>
> /*
> * If it's a faked-up entry, read the real pg_class tuple.
> */
> ! if (needNewCacheFile && relation->rd_isnailed)
> {
> HeapTuple htup;
> Form_pg_class relp;
> --- 2605,2635 ----
> * rows and replace the fake entries with them. Also, if any of the
> * relcache entries have rules or triggers, load that info the hard way
> * since it isn't recorded in the cache file.
> + *
> + * Whenever we access the catalogs to read data, there is a possibility
> + * of a shared-inval cache flush causing relcache entries to be removed.
> + * Since hash_seq_search only guarantees to still work after the *current*
> + * entry is removed, it's unsafe to continue the hashtable scan afterward.
> + * We handle this by restarting the scan from scratch after each access.
> + * This is theoretically O(N^2), but the number of entries that actually
> + * need to be fixed is small enough that it doesn't matter.
> */
> hash_seq_init(&status, RelationIdCache);
>
> while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL)
> {
> Relation relation = idhentry->reldesc;
> + bool restart = false;
> +
> + /*
> + * Make sure *this* entry doesn't get flushed while we work with it.
> + */
> + RelationIncrementReferenceCount(relation);
>
> /*
> * If it's a faked-up entry, read the real pg_class tuple.
> */
> ! if (relation->rd_rel->relowner == InvalidOid)
> {
> HeapTuple htup;
> Form_pg_class relp;
> ***************
> *** 2630,2636 ****
> * Copy tuple to relation->rd_rel. (See notes in
> * AllocateRelationDesc())
> */
> - Assert(relation->rd_rel != NULL);
> memcpy((char *) relation->rd_rel, (char *) relp, CLASS_TUPLE_SIZE);
>
> /* Update rd_options while we have the tuple */
> --- 2646,2651 ----
> ***************
> *** 2639,2660 ****
> RelationParseRelOptions(relation, htup);
>
> /*
> ! * Also update the derived fields in rd_att.
> */
> ! relation->rd_att->tdtypeid = relp->reltype;
> ! relation->rd_att->tdtypmod = -1; /* unnecessary, but... */
> ! relation->rd_att->tdhasoid = relp->relhasoids;
>
> ReleaseSysCache(htup);
> }
>
> /*
> * Fix data that isn't saved in relcache cache file.
> */
> if (relation->rd_rel->relhasrules && relation->rd_rules == NULL)
> RelationBuildRuleLock(relation);
> if (relation->rd_rel->relhastriggers && relation->trigdesc == NULL)
> RelationBuildTriggers(relation);
> }
>
> /*
> --- 2654,2710 ----
> RelationParseRelOptions(relation, htup);
>
> /*
> ! * Check the values in rd_att were set up correctly. (We cannot
> ! * just copy them over now: formrdesc must have set up the
> ! * rd_att data correctly to start with, because it may already
> ! * have been copied into one or more catcache entries.)
> */
> ! Assert(relation->rd_att->tdtypeid == relp->reltype);
> ! Assert(relation->rd_att->tdtypmod == -1);
> ! Assert(relation->rd_att->tdhasoid == relp->relhasoids);
>
> ReleaseSysCache(htup);
> +
> + /* relowner had better be OK now, else we'll loop forever */
> + if (relation->rd_rel->relowner == InvalidOid)
> + elog(ERROR, "invalid relowner in pg_class entry for \"%s\"",
> + RelationGetRelationName(relation));
> +
> + restart = true;
> }
>
> /*
> * Fix data that isn't saved in relcache cache file.
> + *
> + * relhasrules or relhastriggers could possibly be wrong or out of
> + * date. If we don't actually find any rules or triggers, clear the
> + * local copy of the flag so that we don't get into an infinite loop
> + * here. We don't make any attempt to fix the pg_class entry, though.
> */
> if (relation->rd_rel->relhasrules && relation->rd_rules == NULL)
> + {
> RelationBuildRuleLock(relation);
> + if (relation->rd_rules == NULL)
> + relation->rd_rel->relhasrules = false;
> + restart = true;
> + }
> if (relation->rd_rel->relhastriggers && relation->trigdesc == NULL)
> + {
> RelationBuildTriggers(relation);
> + if (relation->trigdesc == NULL)
> + relation->rd_rel->relhastriggers = false;
> + restart = true;
> + }
> +
> + /* Release hold on the relation */
> + RelationDecrementReferenceCount(relation);
> +
> + /* Now, restart the hashtable scan if needed */
> + if (restart)
> + {
> + hash_seq_term(&status);
> + hash_seq_init(&status, RelationIdCache);
> + }
> }
>
> /*

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2009-10-15 18:37:50 Re: BUG #5118: start-status-insert-fatal
Previous Message Tom Lane 2009-10-15 18:25:18 Re: BUG #5120: Performance difference between running a query with named cursor and straight SELECT