Re: [HACKERS] Another nasty cache problem

From: Patrick Welche <prlw1(at)newn(dot)cam(dot)ac(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Another nasty cache problem
Date: 2000-02-03 11:24:34
Message-ID: 20000203112434.B1509@quartz.newn.cam.ac.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 31, 2000 at 09:02:30PM -0500, Tom Lane wrote:
> Patrick Welche <prlw1(at)newn(dot)cam(dot)ac(dot)uk> writes:
> > Tom Lane wrote:
>
> There are cache-flush-related bugs still left to deal with, but they
> seem to be far lower in probability than the ones squashed so far.
> I'm finding that even with MAXNUMMESSAGES set to 8, the parallel tests
> usually pass; so it seems we need some other way of testing to nail down
> the remaining problems.
>
> > I also tried that nonsensical join from the other day, and it failed in
> > the same way again:
> > newnham=# select * from crsids,"tblPerson" where
> > newnham-# crsids.crsid != "tblPerson"."CRSID";
> > Backend sent B message without prior T
>
> Hmm. Can you provide a self-contained test case (a script to build the
> failing tables, preferably)?

It seems this is a memory exhaustion thing: I have 128Mb real memory.
Attached below is the C program used to create some random data in
tables test and test2 of database test (which needs to exist). Executing
the non-sensical query

select * from test,test2 where test.i!=test2.i;

should result in 2600*599=1557400 (ie lots of) rows to be returned.
The process's memory consumption during this select grows to 128Mb, and after
a moment or two:

Backend sent D message without prior T
Backend sent D message without prior T
...

Which isn't quite the same message as before, but is of the same type.

59 processes: 2 running, 57 sleeping
CPU states: 2.3% user, 86.4% nice, 9.3% system, 0.0% interrupt, 1.9% idle
Memory: 74M Act, 37M Inact, 184K Wired, 364K Free, 95M Swap, 262M Swap free

PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
1547 prlw1 50 0 128M 516K run 1:28 59.28% 59.28% psql
1552 postgres 50 0 1920K 632K run 1:37 24.32% 24.32% postgres

later, while the "Backend sent..." messages appear

1547 prlw1 -5 0 128M 68M sleep 1:41 23.00% 23.00% psql
1552 postgres 2 0 1920K 4K sleep 1:41 141.00% 6.88% <postgres>

Note that there is still plenty of swap space. The 128Mb number seems to be
more than a coincidence (how to prove?)

So, is this only happening to me? How can lack of real memory affect timing
of interprocess communication?

Cheers,

Patrick

==========================================================================

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>

#include "libpq-fe.h"

const char *progname;

PGresult *send_query(PGconn *db, const char *query)
{
PGresult *res;

res=PQexec(db,query);
switch(PQresultStatus(res))
{
case PGRES_EMPTY_QUERY:
printf("PGRES_EMPTY_QUERY: %s\n",query);
break;
case PGRES_COMMAND_OK:
printf("PGRES_COMMAND_OK: %s\n",query);
break;
case PGRES_TUPLES_OK:
printf("PGRES_TUPLES_OK: %s\n",query);
break;
case PGRES_COPY_OUT:
printf("PGRES_COPY_OUT: %s\n",query);
break;
case PGRES_COPY_IN:
printf("PGRES_COPY_IN: %s\n",query);
break;
case PGRES_BAD_RESPONSE:
printf("PGRES_BAD_RESPONSE: %s\n",query);
exit(1);
break;
case PGRES_NONFATAL_ERROR:
printf("PGRES_NONFATAL_ERROR: %s\n",query);
break;
case PGRES_FATAL_ERROR:
printf("PGRES_FATAL_ERROR: %s\n",query);
exit(1);
break;
default:
fprintf(stderr,"Error from %s: Unknown response from "\
"PQresultStatus()\n",progname);
exit(1);
break;
}

return res;
}

char get_letter(void)
{
int c;

do c=(int)random()%128;
while(!(isascii(c)&&isalpha(c)));

return (char)tolower(c);
}

unsigned int get_num(void)
{
return random()%100;
}

int main(int argc, char* argv[])
{
char id[7],query[2048];
int i;
PGconn *db;
PGresult *res;

progname=argv[0];

srandom(42); /* same data each time hopefully */

db=PQconnectdb("dbname=test");
if(PQstatus(db)==CONNECTION_BAD)
{
fprintf(stderr,"Error from %s: Unable to connect to database \"test\".\n",
progname);
exit(1);
}

res=send_query(db,"create table test (txt text,var varchar(7),i integer)");
PQclear(res);
res=send_query(db,"create table test2(txt text,var varchar(7),i integer)");
PQclear(res);

for(i=1;i<=2600;++i)
{
sprintf(id,"%c%c%c%c%03u",get_letter(),get_letter(),get_letter(),
get_letter(),get_num());

sprintf(query,"insert into test values ('%s','%s','%i')",id,id,i);
res=send_query(db,query);
PQclear(res);
}

for(i=1;i<=600;++i)
{
sprintf(id,"%c%c%c%c%03u",get_letter(),get_letter(),get_letter(),
get_letter(),get_num());

sprintf(query,"insert into test2 values ('%s','%s','%i')",id,id,i);
res=send_query(db,query);
PQclear(res);
}

PQfinish(db);

return 0;
}

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2000-02-03 12:09:18 Re: [HACKERS] Re: [SQL] Proposed Changes to PostgreSQL
Previous Message Jan Wieck 2000-02-03 10:53:18 Re: [HACKERS] SELECT FOR UPDATE leaks relation refcounts