Re: Warning: Don't delete those /tmp/.PGSQL.* files

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joel Burton" <jburton(at)scw(dot)org>
Cc: pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)com
Subject: Re: Warning: Don't delete those /tmp/.PGSQL.* files
Date: 2000-11-25 22:35:13
Message-ID: 26974.975191713@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

"Joel Burton" <jburton(at)scw(dot)org> writes:
> Working on my database, I had a view that would lock up the
> machine (eats all available memory, soon goes belly-up.) Turned out
> to be a recursive view: view A asked a question of view B that
> asked view A. [is it possible for pgsql to detect this?

It should have been detected --- there is a check in the rewriter that's
supposed to error out after ten recursive rewrite calls. Maybe that
logic is broken, or misses certain cases. Could you exhibit the views
that caused this behavior for you?

> So, I began restarting pgsql w/a line like

> rm -f /tmp/.PGSQL.* && postmaster -i >log 2>log &

> Which works great. Except that I *kept* using this for two weeks
> after the view problem (damn that bash up-arrow laziness!), and
> yesterday, used it to restart PostgreSQL except (oops!) it was
> already running.

> Results: no database at all. All classes (tables/views/etc) returned
> 0 records (meaning that no tables showed up in psql's \d, since
> pg_class returned nothing.)

Ugh. The reason that removing the socket file allowed a second
postmaster to start up is that we use an advisory lock on the socket
file as the interlock that prevents two PMs on the same port number.
Remove the socket file, poof no interlock.

*However*, there is a second line of defense to prevent two postmasters
in the same directory, and I don't understand why that didn't trigger.
Unless you are running a version old enough to not have it. What PG
version is this, anyway?

Assuming you got past both interlocks, the second postmaster would have
reinitialized Postgres' shared memory block for that database, which
would have been a Bad Thing(tm) ... but it would not have led to any
immediate damage to your on-disk files, AFAICS. Was the database still
hosed after you stopped both postmasters and started a fresh one? (Did
you even try that?)

This story does indicate that we need a less fragile interlock against
starting two postmasters on one database. I have to admit that it
hadn't occurred to me that you could break the port-number interlock
so easily as that :-(. But obviously you can, so we need a different
way of representing the interlock. Hackers, any thoughts?

Note: I've narrowed followups to just pghackers, since that seems like
the right forum for discussing a better interlock mechanism.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Larry Rosenman 2000-11-25 22:40:43 Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files
Previous Message Joel Burton 2000-11-25 21:41:38 Warning: Don't delete those /tmp/.PGSQL.* files

Browse pgsql-hackers by date

  From Date Subject
Next Message Larry Rosenman 2000-11-25 22:40:43 Re: [GENERAL] Warning: Don't delete those /tmp/.PGSQL.* files
Previous Message Larry Rosenman 2000-11-25 22:24:11 tcl/FreeBSD 4.2-STABLE, multiple TCL versions installed