7.0RC1: possible query and backend problem

From: Michael Blakeley <mike(at)blakeley(dot)com>
To: pgsql-general(at)postgreSQL(dot)org
Subject: 7.0RC1: possible query and backend problem
Date: 2000-04-20 00:16:13
Message-ID: p04310103b523f49d6641@blakeley.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I've just upgraded to 7.0RC1 on a Solaris 2.6+patches system (U5 with
256MB). I'm having some trouble with a brand-new query, so I don't
know if this is a new bug or an old one. It has two parts: query
execution and backend robustness.

First, I'm getting a crash with this query (via Apache mod_perl and
DBI, or via psql):

Table U: id varchar(128) primary key, stop datetime, start datetime
Table S1: id varchar(128) primary key, url varchar(512);
Table S2: id varchar(128) primary key, url varchar(512);

SELECT id,date_part('epoch',sum(stop-start)),count(*),S1.url,S2.url
FROM U WHERE 'now'::datetime-start<'1 month'::interval
AND (id=S1.id OR id=S2.id) GROUP BY id,S1.url,S2.url;

The idea is to select whichever of S1.url and S2.url is non-null.

From psql, I get:

pqReadData() -- backend closed the channel unexpectedly.
This probably means the backend terminated abnormally
before or while processing the request.
connection to server was lost

And now there are no postgres processes running.

I suspect that the problem is that, in my data set, either S1.url or
S2.url will be null for any given row, and this is causing problems
for GROUP BY (I can also believe that my SQL is bad - but that
shouldn't crash postgres, should it?). If I stick to just S1, the
query runs ok.

But... now there are no postgres processes running at all.

# ps -fu postgres
UID PID PPID C STIME TTY TIME CMD

In the postgres log:

>Server process (pid 11189) exited with status 139 at Wed Apr 19 16:37:25 2000
>Terminating any active server processes...
>NOTICE: Message from PostgreSQL backend:
> The Postmaster has informed me that some other backend died
>abnormally and possibly corrupted shared memory.
> I have rolled back the current transaction and am going to
>terminate your database system connection and exit.
> Please reconnect to the database system and repeat your query.
>Server processes were terminated at Wed Apr 19 16:37:25 2000
>Reinitializing shared memory and semaphores
>IpcMemoryCreate: shmget failed (Invalid argument) key=5432110,
>size=144, permission=700
>This type of error is usually caused by an improper
>shared memory or System V IPC semaphore configuration.
>For more information, see the FAQ and platform-specific
>FAQ's in the source directory pgsql/doc or on our
>web site at http://www.postgresql.org.
>IpcMemoryIdGet: shmget failed (Invalid argument) key=5432110,
>size=144, permission=0
>IpcMemoryAttach: shmat failed (Invalid argument) id=-2
>FATAL 1: AttachSLockMemory: could not attach segment

Now, the real problem here is that postgres seems to be trying to
restart, and fails. It complains about shmget - my /etc/system says:
set shmsys:shminfo_shmmax = 0x40000000
set shmsys:shminfo_shmmni = 100
set shmsys:shminfo_shmseg = 32
set semsys:seminfo_semmns = 1024
set semsys:seminfo_semmni = 1024
set semsys:seminfo_semmnu = 1024
set semsys:seminfo_semmsl = 1024
set semsys:seminfo_semmap = 1024

So I don't think this is really a config problem (if it were, the db
wouldn't start up to begin with). Rather, I'd guess that postgres
isn't freeing a memory block during the "Terminating any active
server processes..." phase.

Running ipcs -a immediately after the crash shows several segments
(although oddly, I have no trouble starting the db again as root -
maybe there's a permissions issue at work here?):

# ipcs -a
IPC status from <running system> as of Wed Apr 19 16:45:42 2000
T ID KEY MODE OWNER GROUP CREATOR
CGROUP NATTCH SEGSZ CPID LPID ATIME DTIME CTIME
Shared Memory:
m 800 0x0052e32e --rw------- postgres postgres postgres
postgres 0 120 12737 12737 13:01:36 13:01:36 13:01:36
m 801 0x0052e325 --rw------- postgres postgres postgres
postgres 0 15373312 12737 12737 13:01:36 13:01:36 13:01:36
m 802 0x0052e32b --rw------- postgres postgres postgres
postgres 0 102184 12737 12737 13:01:36 13:01:36 13:01:36
m 3 0x500005a3 --rw-r--r-- root root root
root 1 68 232 232 15:29:35 15:29:35 15:29:35

Anyway, I hope this report is helpful. Thanks to the developers for
their hard work on PG7.

-- Mike

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Lincoln Yeoh 2000-04-20 00:31:46 Re: Connecting website with SQL-database.....
Previous Message Tom Cook 2000-04-19 23:25:29 Re: Sorry, I forgot put the exception in my message...