Fix up for BTP_CHAIN problems

From: Wayne Piekarski <wayne(at)senet(dot)com(dot)au>
To: pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Fix up for BTP_CHAIN problems
Date: 1999-07-17 03:40:04
Message-ID: 199907170340.NAA10098@helpdesk.senet.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

Hi,

A few weeks ago I sent an email out about getting BTP_CHAIN faults when
trying to perform operations with tables. My colleague Matt Altus was
trawling the mailing lists looking for information about this, and he
found some articles previously discussing problems with Btree indices and
how they sometimes can have problems handling tables with massive
duplicate entries in them, as the tree becomes unbalanced, and mentioned
other things like leaf nodes and so on. The postings talked about how
fixing up the problem was tricky and was still there, and Oracle solved it
by including the tid in with the index to make it more unique.

Well, we thought about this, and had a look at every table and index we'd
ever had BTP_CHAIN problems with, and all had massive duplication of
values in the particular columns. Ie, one table has 1.5 million rows, and
one of the columns with an index on it (snum) has only 20000 unique values
- this particular table was very troublesome, whereas others weren't so
bad because they were a lot smaller. Each table we looked at were all the
same problem, and we thought wow, this is really neat because all our
problem tables were explained by these postings. None of our other indexes
caused problems, because they were more unique.

Each one of our tables has a column called id which is very similar to an
oid except we generate it ourselves, and so we put in a reference to the
id column after all the other columns in our indexes. ie,

create index sessions_snum_index on sessions using btree (snum);

became:

create index sessions_snum_index on sessions using btree (snum, id);

The indexes grew a little bit, but now we have not had *ANY* BTP_CHAIN
faults at all, and to test it we really thrashed the machine to see if we
could cause it to die. It worked perfectly and we were all really happy
because BTP_CHAIN was very annoying to fix up. It was occuring a lot when
the machine was under high load.

So I can definitely recommend this to anyone who has problems like this,
or tables with lots of rows but not many unique values. The problem does
not occur under simple circumstances, only under cases where many backends
are all running and the system is under a high load.

Would a solution to the problem be to automatically include the row OID
when creating an index? This would fix the problem for everyone
automatically without having to do the hack manually. Is it ok to include
the OID in an index? I wasn't sure about this which is why I included my
own ID value instead so someone might want to comment on this.

Just thought I'd share this with everyone so we can all benefit from it.
This is a problem which really caused us to doubt the ability of Postgres
to be used in a high load environment and so I think it should be
mentioned somewhere. Maybe in the docs?

BTW, since getting around BTP_CHAIN our only remaining problem is the
backends waiting thing, and we are upgrading to 6.5 tomorrow which we hope
will fix this up forever. We did some testing of 6.5 and it runs a *lot*
faster, is more reliable, and the load of the machine is very much lower
than it normally is with 6.4.2 with our thrash testing program. I assume
that 6.4 style code will work unchanged in 6.5? Ie, we've used a lot of
LOCK TABLE xxx; code everywhere, which we hope will work untouched in 6.5.

We'll report back after our upgrade once we know that everything works
really well.


Regards,
Wayne

------------------------------------------------------------------------------
Wayne Piekarski Tel: (08) 8221 5221
Research & Development Manager Fax: (08) 8221 5220
SE Network Access Pty Ltd Mob: 0407 395 889
222 Grote Street Email: wayne(at)senet(dot)com(dot)au
Adelaide SA 5000 WWW: http://www.senet.com.au

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1999-07-17 04:14:15 Re: include-file cleanup
Previous Message Wayne Piekarski 1999-07-17 03:23:22 Re: Oversize proc sources (was Re: [BUGS] Backend dies creating plpgsql procedures (with reproducible example!))

Browse pgsql-sql by date

  From Date Subject
Next Message Stephen Boyle 1999-07-18 07:37:56 Re: [SQL] Stored Procedures and other stuffs
Previous Message tjk@tksoft.com 1999-07-16 22:24:05 user defined function speeds