Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6

From: dg(at)illustra(dot)com (David Gould)
To: maillist(at)candle(dot)pha(dot)pa(dot)us (Bruce Momjian)
Cc: dg(at)illustra(dot)com (David Gould)
Subject: Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6
Date: 1998-03-19 03:36:05
Message-ID: 9803190336.AA05972@hawk.illustra.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > Right now, there is not much chance of catching a signal while waiting for
> > a spinlock. This is good, cause the code that waits for spinlocks tends to
> > be doing funny things with buffers and locks and shared stuff like that.
> > We don't catch signals because we don't make syscalls. But, once this goes in,
> > we will be calling select() a _lot_ and so it kinda becomes the likely place
> > for a signal to get delivered. Without thinking about it and looking at the
> > code a bit longer, I am not sure it is prudent to rush this in. I still want
> > it in as soon a possible, but possible includes free from harmful side effects.

> Well, signals are handled in the backend by tcop/postgres.c. In
> most/all? cases, a signal causes a longjump() out of the code and
> releases locks and aborts the transaction.

I was afraid that would be the answer. Basically, this never worked. The
problem is that in an indeterminate number of places the code manipulates
multiple distinct but related structures in a sequence. To leave the server
in a consistant state all these updates need to be done in the right order
so that if the sequence in interrupted by a signal the partially updated
state is consistant. Unhappily, the original coders did not always have this
in mind. An example:

cleaning up after a scan
1 - release buffer pointed by scan descriptor
2 - release scan descriptor

If a signal is taken just after one, the abort code will see the scan
descriptor and call the cleanup for it resulting in:

cleaning up after a scan (take 2)
1 - release buffer pointed by scan descriptor
- Whoopsie, buffer already released!
2 - release scan descriptor

These sequences either _all_ have to identified and fixed, or made atomic
somehow, which is a biggish job.

Or the system has to acknowledge signals at only clearly defined points.

My preference is to have signal handlers only set a global flag to indicate
that a signal was seen. Then one just sprinkles check_for_interrupts() calls
in all the likely places: step to next node, fetch next page, call function,
etc.

The way this shows up in real life is strange unreproduceable errors on busy
systems, especially when backends are killed or transactions are forced to
abort.

Fixing this is a bit of a project, but the result is that a lot of mystery
bugs go away.

> I considered the possibility that your select() could return EINTR, so I
> was thinking of suggesting something like
>
> do { } while (lock-no-set && (select(),true) )
>
> or something like that so the return value of select is always true. I
> also recommend making a S_LOCK macro, and inside the while loop, call
> the OS-specific lock stuff, so you don't have to code the select() for
> each platform. So you have two macros, the S_LOCK macro with the while,
> and inside the while, you call S_LOCK_SPIN() which is defined for each
> platform. More centralized and less error-prone.

You may like what I have done then. S_LOCK() becomes a function:

S_LOCK(lock)
{
do {
while (!S_LOCK_FREE(lock)) /* non interlocked test to avoid */
{ /* hammering the bus and caches */
select( a_semi_random_time_delay_with_backoff );
}
} while (TAS(lock)); /* TAS: test and set
}

S_LOCK_FREE(), and TAS() are platform specific macros that invoke the
platform specific implementation. There are default implementations for
all the platform interface macros, so most of the time all that is needed
is to create a tas() function in asm and the rest falls into place.

Now I have to go home and test it.

Btw, feel free to forward this to the list if you feel it is of interest. I
didn't because I didn't want to quote you from private mail.

-dg

David Gould dg(at)illustra(dot)com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
- I realize now that irony has no place in business communications.

Browse pgsql-hackers by date

  From Date Subject
Next Message t-ishii 1998-03-19 04:09:58 multi-byte aware char_length() etc.
Previous Message Thomas G. Lockhart 1998-03-19 03:24:25 Re: [HACKERS] Re: [PATCHES] patches for 6.2.1p6