Strong feeling of something ugly lurking deeply within 7.0 ;-)

From: Christof Petig <christof(dot)petig(at)wtal(dot)de>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Michael Meskes <meskes(at)postgresql(dot)org>
Subject: Strong feeling of something ugly lurking deeply within 7.0 ;-)
Date: 2000-10-02 21:27:41
Message-ID: 39D8FDCC.D43E7F77@wtal.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The severity of this bug heavily depends on your lack of buggy programs.

Short description:
Long standing open transactions combined with high traffic updates and
some regular vacuums eventually corrupt memory.

Long description:
Due to a design flaw within our ecpg Programs (I don't recommend
designing for autocommit off!) some transactions stayed open for several
days. A process data collection system generates a lot of status change
updates (3MB a day) to about 110 rows in a table at the same time.
After 1024 updates I vacuum the high traffic table which should shrink
to 16kB. First I noticed that vacuum did not free old tuples. This put
me on the track of the real cause.

Since three weeks (more buggy long standing transactions) I have seen
one major crash of the program system per week. For months I have seen
some strange NOTICES which went away after another vacuum. And this
morning I found a 'possible memory corruption, killing other backends'
message.

The situation got better and better during the 7.0 development cycle (I
started with a pre-beta version this January and reported some
concurrent vacuum oddities that time). And it got worse the more
interactive programs we added.
But up to now I didn't see the special addon which causes the pain: Long
standing transactions.

It's not very bad. This seems to happen on rare conditions. Until this
week I thought of it as a minor oddity - a temporary nuissance.

And: It is current stable CVS tree! running on a 233MHz Pentium2, Linux
2.2.14(?)

Sample Code:
update bn_actual set meter=meter+1 where machine= ?; // repeat every
second
combined with
begin transaction; // hold
select something;
and
vacuum analyze; // once a day
and
vacuum bn_actual; // every 1024 updates

and some others.

PS: Of course I'm currently fixing the long transactions problem. I'll
tell you once the system runs 4 weeks again without any strange
occurence.
PPS: Yes, I'm following the hackers list.
P3S: No, I don't believe in a hardware bug.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2000-10-03 04:51:07 Re: Strong feeling of something ugly lurking deeply within 7.0 ;-)
Previous Message Bruce Momjian 2000-10-02 17:32:54 Re: grant/revoke bug with delete/update