Re: [GENERAL] Release LRU file

From: Mike Mascari <mascarm(at)mascari(dot)com>
To: kimi(at)intercept(dot)co(dot)in
Cc: pgsql-general(at)postgreSQL(dot)org, scrappy(at)hub(dot)org
Subject: Re: [GENERAL] Release LRU file
Date: 1999-12-21 16:16:32
Message-ID: 385FA7E0.C9C26701@mascari.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Kimi wrote:
>
> Hi,
>
> This is in continuation of mails I sent last week about postgres
> crashing
> We are running pg 6.5.1, on Redhar 5.1 with DBI 0.92 and DBD 1.13 on a
> 512 MB RAM
> and SCSI machine
>
> Our application consists of requests going upto 150 per second on this
> database
> with an expected uptime of 24 by 7.
> Earlier we were getting spinlock messages which we have hoped to sort
> out by raising
> number of open files per process to 1024 from the earlier 256
>
> Postgres crashes giving an error message : FATAL 1: Release LRU file :
> No opened files /
> no one can be closed.
>
> Now can anybody help on how to solve this.
>
> Please help
>
> Bye,
>
> Murali
> Differentiated Software Solutions

We have been running a production server under a somewhat
lighter load, and encountered this once. The following
conversation took place on the mailing list about a month
ago:

http://www.PostgreSQL.ORG/mhonarc/pgsql-hackers/1999-11/msg00454.html
------------------------------------------------------------
Mike Mascari <mascarim(at)yahoo(dot)com> writes:
> FATAL 1: ReleaseLruFile: No opened files - no one can be closed

> This is the first time this has ever happened.

I've never seen that either. Offhand I do not recall any
post-6.5
changes that would affect it, so the problem (whatever it
is) is
probably still there.

After eyeballing the code, it seems there are only two ways
this
could happen:

1. the number of "allocated" (non-virtual) file descriptors
grew to
exceed the number of files Postgres thinks it can have open;

2. something else was temporarily exhausting your kernel's
file table
space, so that ENFILE was returned for many successive
attempts to
open a file. (After each one, fd.c will close another file
and try
again.)

#2 seems improbable on an unloaded system, and isn't real
probable even
on a loaded one, since you'd have to assume that some other
process
managed to suck up each filetable slot that fd.c released
before fd.c
could re-acquire it. Once, yes, but several dozen times in
a row?

So I'm guessing a leak of allocated file descriptors.

After grovelling through the calls to AllocateFile, I only
see one
prospect for a leak: it looks to me like verify_password()
neglects
to close the password file if an invalid user name is
given. Do you
use a plain (non-encrypted) password file? If so, I'll bet
you can
reproduce the crash by trying repeatedly to connect with a
username
that's not in the password file. If that pans out, it's a
simple fix:
add "FreeFile(pw_file);" near the bottom of
verify_password() in
src/backend/libpq/password.c. Let me know if this guess is
right...

regards, tom lane
------------------------------------------------------------

Hope that helps,

Mike Mascari

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Mike Mascari 1999-12-21 16:20:37 Re: [GENERAL] item descriptions in psql
Previous Message Kimi 1999-12-21 15:40:08 Release LRU file