Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

From: dg(at)illustra(dot)com (David Gould)
To: dz(at)cs(dot)unitn(dot)it (Massimo Dal Zotto)
Cc: maillist(at)candle(dot)pha(dot)pa(dot)us, hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6
Date: 1998-03-20 19:15:47
Message-ID: 9803201915.AA09552@hawk.illustra.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Massimo Dal Zotto writes:
> > David, go for it. The code is all local in two files, and I think you
> > can basically change all the do{test-and-set} while(lock-is-false)
> > loops to:
> >
> > do{test-and-set} while(lock-is-false && select ())
> >
> > Pretty easy. No need to test multiple platforms. The ones where the
> > loop is integrated into the asm(), leave them for later.
> >
> > --
> > Bruce Momjian | 830 Blythe Avenue
>
> I'am against a generic patch using select(). If we have sched_yield() on an
> architecture I don't see why dont't use it. Here is the patch for Linux.
> It has been tested for two months by 100 users without any problem.
> The only thing I would add is a more general configuration test in configure
> to include the proper include files.
>
> *** src/include/storage/s_lock.h.orig Sat Oct 18 22:39:21 1997
> --- src/include/storage/s_lock.h Wed Nov 19 23:11:14 1997
> ***************
> *** 294,300 ****
> --- 294,314 ----
> */
>
> #if defined(NEED_I386_TAS_ASM)
> + #include <unistd.h>
> + #include <sched.h>
>
> + #ifdef _POSIX_PRIORITY_SCHEDULING
> + #define S_LOCK(lock) do \
> + { \
> + slock_t _res; \
> + do \
> + { \
> + __asm__("xchgb %0,%1": "=q"(_res), \
> + "=m"(*lock):"0"(0x1)); \
> + if (_res) sched_yield(); \
> + } while (_res != 0); \
> + } while (0)
> + #else
> #define S_LOCK(lock) do \
> { \
> slock_t _res; \
> ***************
> *** 303,308 ****
> --- 317,323 ----
> __asm__("xchgb %0,%1": "=q"(_res), "=m"(*lock):"0"(0x1)); \
> } while (_res != 0); \
> } while (0)
> + #endif
>
> #define S_UNLOCK(lock) (*(lock) = 0)
>
> +----------------------------------------------------------------------+
> | Massimo Dal Zotto e-mail: dz(at)cs(dot)unitn(dot)it |
> | Via Marconi, 141 phone: ++39-461-534251 |
> | 38057 Pergine Valsugana (TN) www: http://www.cs.unitn.it/~dz/ |
> | Italy pgp: finger dz(at)tango(dot)cs(dot)unitn(dot)it |
> +----------------------------------------------------------------------+

I am perfectly happy to use sched_yield() on Linux. My goal however is to
make the concept work on all platforms.

There was a recent thread on comp.os.linux.system on context switch times
of Linux vs NTthat revealed that sched_yield() is fairly costly as it causes
the scheduler to do a full scan of the process table and recalculate the
the priorities of all processes. Probably not a problem, but it should
be it should probably be benchmarked both ways.

Finally, even though this appears to work, there is a possible stability
problem with both approaches. Here is some of the discussion I had about
that with Bruce:

-----------------------------------------------------------------------
David:
> Right now, there is not much chance of catching a signal while waiting for
> a spinlock. This is good, cause the code that waits for spinlocks tends to
> be doing funny things with buffers and locks and shared stuff like that.
> We don't catch signals because we don't make syscalls. But, once this goes in,
> we will be calling select() a _lot_ and so it kinda becomes the likely place
> for a signal to get delivered. Without thinking about it and looking at the
> code a bit longer, I am not sure it is prudent to rush this in. I still want
> it in as soon a possible, but possible includes free from harmful side effects.
Bruce:
> Well, signals are handled in the backend by tcop/postgres.c. In
> most/all? cases, a signal causes a longjump() out of the code and
> releases locks and aborts the transaction.

David:
> I was afraid that would be the answer. Basically, this never worked. The
> problem is that in an indeterminate number of places the code manipulates
> multiple distinct but related structures in a sequence. To leave the server
> in a consistant state all these updates need to be done in the right order
> so that if the sequence in interrupted by a signal the partially updated
> state is consistant. Unhappily, the original coders did not always have this
> in mind. An example:
>
> cleaning up after a scan
> 1 - release buffer pointed by scan descriptor
> 2 - release scan descriptor
>
> If a signal is taken just after one, the abort code will see the scan
> descriptor and call the cleanup for it resulting in:
>
> cleaning up after a scan (take 2)
> 1 - release buffer pointed by scan descriptor
> - Whoopsie, buffer already released!
> 2 - release scan descriptor
>
> These sequences either _all_ have to identified and fixed, or made atomic
> somehow, which is a biggish job.
>
> Or the system has to acknowledge signals at only clearly defined points.
> My preference is to have signal handlers only set a global flag to indicate
> that a signal was seen. Then one just sprinkles tests of the flag in
> all the likely places: step to next node, fetch next page, call function, etc.
>
> The way this shows up in real life is strange unreproduceable errors on busy
> systems, especially when backends are killed or transactions are forced to
> abort.
>
> Fixing this is a bit of a project, but the result is that a lot of mystery
> bugs go away.

--------------------------------------------------------------------------

-dg

David Gould dg(at)illustra(dot)com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- I realize now that irony has no place in business communications.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1998-03-20 22:14:29 Re: Added Having Clause
Previous Message Maurice Gittens 1998-03-20 18:30:55 Re: [HACKERS] Re: Buffer overruns with the Electric Fence debugging library (Solved?)