RE: [HACKERS] Re: Concurrent VACUUM: first results

From: "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: RE: [HACKERS] Re: Concurrent VACUUM: first results
Date: 1999-11-29 00:32:56
Message-ID: 001601bf3a01$47d0ae60$2801007e@cadzone.tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> I have committed the code change to remove pg_vlock locking from VACUUM.
> It turns out the problems I was seeing initially were all due to minor
> bugs in the lock manager and vacuum itself.
>
> > 1. You can run concurrent "VACUUM" this way, but concurrent "VACUUM
> > ANALYZE" blows up. The problem seems to be that "VACUUM ANALYZE"'s
> > first move is to delete all available rows in pg_statistic.
>
> The real problem was that VACUUM ANALYZE tried to delete those rows
> *while it was outside of any transaction*. If there was a concurrent
> VACUUM inserting tuples into pg_statistic, the new VACUUM would end up
> calling XactLockTableWait() with an invalid XID, which caused a failure

Hmm,what I could have seen here was always LockRelation(..,RowExclu
siveLock). But the cause may be same.
We couldn't get xids of not running *transaction*s because its proc->xid
is set to 0(InvalidTransactionId). So blocking transaction couldn' find an
xidLookupEnt in xidTable corresponding to the not running *transaction*
when it tries to LockResolveConflicts() in LockReleaseAll() and couldn't
GrantLock() to XidLookupEnt corresponding to the not running *transac
tion*. After all LockAcquire() from not running *transaction* always fails
once it is blocked.

> I have fixed the simpler aspects of the problem by adding missing
> SpinRelease() calls to lock.c, making lmgr.c test for failure, and
> altering VACUUM to not do the bogus row deletion. But I suspect that
> there is more to this that I don't understand. Why does calling
> XactLockTableWait() with an already-committed XID cause the following

It's seems strange. Isn't it waiting for a being deleted tuple by vc_upd
stats() in vc_vacone() ?

> code in lock.c to trigger? Is this evidence of a logic bug in lock.c,
> or at least of inadequate checks for bogus input?
>
> /*
> * Check the xid entry status, in case something in the ipc
> * communication doesn't work correctly.
> */
> if (!((result->nHolding > 0) && (result->holders[lockmode] > 0)))
> {
> XID_PRINT_AUX("LockAcquire: INCONSISTENT ", result);
> LOCK_PRINT_AUX("LockAcquire: INCONSISTENT ", lock, lockmode);
> /* Should we retry ? */
> SpinRelease(masterLock); <<<<<<<<<<<< just added by me
> return FALSE;
> }
>

This is the third time I came here and it was always caused by
other bugs.

Regards,

Hiroshi Inoue
Inoue(at)tpf(dot)co(dot)jp

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vince Vielhaber 1999-11-29 03:05:28 Re: BOUNCE pgsql-ports@postgreSQL.org: Non-member submission from [Joe Brenner <doom@kzsu.stanford.edu>] (fwd)
Previous Message Tom Lane 1999-11-28 23:30:23 Re: [HACKERS] How to get OID from INSERT in PL/PGSQL?