Re: [HACKERS] strange behaviour on pooled alloc

From: jwieck(at)debis(dot)com (Jan Wieck)
To: maillist(at)candle(dot)pha(dot)pa(dot)us (Bruce Momjian)
Cc: jwieck(at)debis(dot)com, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] strange behaviour on pooled alloc
Date: 1999-02-06 17:28:16
Message-ID: m109BWi-000EBPC@orion.SAPserv.Hamburg.dsh.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce Momjian wrote:

> > The strange behaviour now is that depending on the blocksize
> > and the limit for block/single alloction I use for the pools,
> > the portals_p2 regression test fails or not.
> > [...]
> > I have absolutely no clue what's going on here. Anyone an
> > idea how to track this down?
>
> My recommendation is to apply the fix and let others debug it. Someone
> will find the cause. Just give them a reproducable test case. In many
> cases, more eyes or another OS shows the error much clearer.

New version of AllocSet...() functions is committed. palloc()
is a macro now. The memory eating problem of COPY FROM,
INSERT ... SELECT and UPDATES on a table that has constraints
is fixed (new file nodes/freefuncs.c).

The settings in aset.c aren't optimal for now, because the
settings in place force the portals_p2 test to fail (at least
here). Some informations for those who want to take a look at
it follow.

Reproducing the bug:

The bug can be reproduced after the regression test has
been run by running only portals_p2.sql.

To cause the error, the postmaster must be started with
-B64 (default) and at least one environment variable
(e.g. PGDATESTYLE), that causes psql to send a SET on
connection must be set.

If -B is greater than 64, AllocSetAlloc() put's the
allocation for the buffer reference counts in the
execution state EState into it's own malloc() area, not
into a smallchunk block. The problem disappears.

If the ALLOC_BLOCK_SIZE (in aset.c) is changed to 8192,
the problem also disappears.

If none of the mentioned environment variables is set,
the BEGIN from the regression test is the first command
sent to the backend and the problem disappears too. But
adding a simple BEGIN; END; to the top of the test forces
it to appear again, so it isn't in the variable setting
code.

Guessings:

The symptom is that in the case of many portals on a big
table rows that are there don't show up. Each cursor
declaration results in it's own ExecutorStart(), where
the buffer reference count is saved into the newly
created execution state and reset to zero. Later on
ExecutorEnd() these states are restored.

These disappearing rows might have to do with unpinned
buffers that are expected to be pinned.

Since it depends on whether the allocation for the saved
reference counts is taken from a block or allocated
separately, I think some counts get corrupted from
somewhere else.

It also depends on the blocksize, one more point that it
might be from somewhere else because the refcount areas
must live in the same block with some other allocation
together.

I'll keep on debugging, but would be very appreciated if
someone could help.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck(at)debis(dot)com (Jan Wieck) #

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1999-02-06 17:51:52 Re: [HACKERS] Optimizer speed and GEQO (was: nested loops in joins)
Previous Message Tom Lane 1999-02-06 17:27:47 Re: [SQL] Functional Indexes