Re: [HACKERS] New Driver and Unique Indexes

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: byronn(at)insightdist(dot)com (Byron Nikolaidis)
Cc: pgsql-interfaces(at)postgreSQL(dot)org, infotecn(at)tin(dot)it, pgsql-hackers(at)postgreSQL(dot)org, mstaedt(at)va-sigi(dot)va(dot)fh-ulm(dot)de, hannu(at)trust(dot)ee
Subject: Re: [HACKERS] New Driver and Unique Indexes
Date: 1998-04-28 02:42:36
Message-ID: 199804280242.WAA02534@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-interfaces

> One downside about UNIQUE INDEXES however, is how Microsoft Access
> handles them when you open the table in datasheet view. Whether you
> specify the unique index at link time, or the driver provides the info,
> Access will try to use queries which show up a problem with the backend:
>
> Here is an example of an Access query with a unique index on a single
> field:
>
> SELECT balance_id,company_id, balance_date, is_audited,comment,
> balance_type, balance_filename FROM balance WHERE balance_id = 1 OR
> balance_id = 2 OR balance_id = 3 OR balance_id = 4 OR balance_id = 5 OR
> balance_id = 6 OR balance_id = 7 OR balance_id = 8 OR balance_id = 9 OR
> balance_id = 10
>
> The more keyparts you have, the worse the problem is (2 keyparts):
>
> SELECT balance_id,company_id, balance_date, is_audited,comment,
> balance_type, balance_filename FROM balance WHERE balance_id = 1 AND
> company_id=1 OR balance_id = 1 AND company_id=2 OR balance_id = 1 AND
> company_id=3 OR balance_id = 2 AND company_id=1 OR balance_id = 2 AND
> company_id=2 OR balance_id = 2 AND company_id=3 OR balance_id = 3 AND
> company_id=1 OR balance_id = 3 AND company_id=2 OR balance_id = 3 AND
> company_id=3 OR balance_id = 4 AND company_id=1

OK, I have the dope on this one. The palloc failure is not the OR
indexing, but rather the item:

* Fix memory exhaustion when using many OR's

The bug report that prompted this is attached. As you can see, it was
also prompted by MS-Access. The problem is that the backend uses the
text-book method of processing OR's by converting the WHERE clause to
Conjunctive-Normal-Form(CNF), and this exponentially explodes the number
of tests where there are many OR clauses.

We are not sure how to fix it yet. Vadim has improved the handling of
this in 6.3.*, but it still is not perfect and needs a solution.
Obviously other databases are not CNF'ifing the queries so there must be
a solution. David?

---------------------------------------------------------------------------

Date: Mon, 12 Jan 1998 15:53:18 -0500
From: David Hartwig <daveh(at)insightdist(dot)com>
To: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
Subject: Re: [BUGS] General Bug Report: palloc fails with lots of ANDs and ORs

This is a multi-part message in MIME format.
--------------20C7AC27E8BCA117B23354BE
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Bruce,

I did some homework. Here is what I have. The default max data segment size on our (AIX 4.1.4) box is around 130000 kbytes.

I put together a query which put me just past the threshold of the palloc "out of memory error". It is as follows:

create table outlet (
number int,
name varchar(30),
...
}

create unique index outlet_key on outlet using btree (number);

select count(*) from outlet
where
(number = 1 and number = 1 and number = 1) or
(number = 1 and number = 1 and number = 1) or
(number = 1 and number = 1 and number = 1) or
(number = 1 and number = 1 and number = 1) or
(number = 1 and number = 1 and number = 1) or
(number = 1 and number = 1 and number = 1) or
(number = 1 and number = 1 and number = 1) or
(number = 1 and number = 1 and number = 1) or
(number = 1 and number = 1 and number = 1);

Not pretty but it makes the point. Take out two OR clauses and the query works fine (but a bit slow).

The above query is all it takes to use up all 130000 Kbytes of memory. And, since the query takes a long time to finally fail, I was able to
observe the memory consumption.

I extended the max data segment to 300000. And tried again. I could observer the memory consumption up to about 280000 when the system
suddenly got sick. I was getting all kinds of messages like "cant fork"; bad stuff. The system did finally recover on its own. I am not
sure happened there. I know that ulimit puts us right around the physical memory limits of out system.

Using 300 meg for the above query seems like a bit of a problem. It is difficult to imagine where all that memory is being used. I will
research the problem further if you need more information.

Bruce Momjian wrote:

> Try changing your OS default memory size. Unsure how to do this under
> AIX.
>
> >
> >
> > ============================================================================
> > POSTGRESQL BUG REPORT TEMPLATE
> > ============================================================================
> >
> >
> > Your name : David Hartwig
> > Your email address : daveh(at)insightdist(dot)com
> >
> > Category : runtime: back-end: SQL
> > Severity : serious
> >
> > Summary: palloc fails with lots of ANDs and ORs
> >
> > System Configuration
> > --------------------
> > Operating System : AIX 4.1
> >
> > PostgreSQL version : 6.2
> >
> > Compiler used : native CC
> >
> > Hardware:
> > ---------
> > RS 6000
> >
> > Versions of other tools:
> > ------------------------
> > NA
> >
> > --------------------------------------------------------------------------
> >
> > Problem Description:
> > --------------------
> > The follow is a mail message describing the problem on the PostODBC mailing list:
> >
> >
> > I have run across this also. We traced it down to a failure in the PostgreSQL server. This occurs under the following conditions.
> >
> > 1. MS Access
> > 2. Specify a multi-part key in the link time setup with postgresql
> > 3. Click on table view.
> >
> > What happens is MS Access takes the following steps. First it selects all possible key values for the table being viewed. I
> > suspect it maps the key values to the relative row position in the display. Then it uses the mapping to generate future queries based
> > on the mapping and the rows showing on the screen. The queries take the following form:
> >
> > SELECT keypart1, keypart2, keypart3, col4, col5, col6 ... FROM example_table
> > WHERE
> > (keypart1 = row1keypartval1 AND keypart2 = row1keypartval2 AND keypart3 = row1keypartval3) OR
> > (keypart1 = row2keypartval1 AND keypart2 = row2keypartval2 AND keypart3 = row2keypartval3) OR
> > .
> > . -- 28 lines of this stuff. Why 28... Why not 28
> > .
> > (keypart1 = row27keypartval1 AND keypart2 = row27keypartval2 AND keypart3 = row27keypartval3) OR
> > (keypart1 = row28keypartval1 AND keypart2 = row28keypartval2 AND keypart3 = row28keypartval3);
> >
> >
> > The PostgreSQL sever chokes on this statement claiming it is out of memory. (palloc) In this example I used a three part key. I
> > do not recall if a three part key is enough to trash the backend. It has been a while. I have tried sending these kinds of statements
> > directly through the psql monitor and get the same result.
> >
> >
> > --------------------------------------------------------------------------
> >
> > Test Case:
> > ----------
> > select c1, c1 c3, c4, c5 ... from example_table
> > where
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something) or
> > (c1 = something and c2 = something and c3 = something and c4 = something);
> >
> >
> > --------------------------------------------------------------------------
> >
> > Solution:
> > ---------
> >
> >
> > --------------------------------------------------------------------------
> >
> >
> >
>
> --
> Bruce Momjian
> maillist(at)candle(dot)pha(dot)pa(dot)us

--------------20C7AC27E8BCA117B23354BE
Content-Type: text/x-vcard; charset=us-ascii; name="vcard.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for David Hartwig
Content-Disposition: attachment; filename="vcard.vcf"

begin: vcard
fn: David Hartwig
n: Hartwig;David
org: Insight Distribution Systems
adr: 222 Shilling Circle;;;Hunt Valley ;MD;21030;USA
email;internet: daveh(at)insightdist(dot)com
title: Manager Research & Development
tel;work: (410)403-2308
x-mozilla-cpt: ;0
x-mozilla-html: TRUE
version: 2.1
end: vcard

--------------20C7AC27E8BCA117B23354BE--

--
Bruce Momjian | 830 Blythe Avenue
maillist(at)candle(dot)pha(dot)pa(dot)us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas G. Lockhart 1998-04-28 02:58:20 Re: [HACKERS] initdb problem and operator question
Previous Message Thomas G. Lockhart 1998-04-28 02:22:42 Re: [HACKERS] Re: [QUESTIONS] Practical SQL Handbook - demo script for postgreSQL

Browse pgsql-interfaces by date

  From Date Subject
Next Message The Hermit Hacker 1998-04-28 04:44:18 [ssl-users] ANNOUNCE: PostgreSQL-SSL patch v.01a (fwd)
Previous Message Bruce Momjian 1998-04-28 02:17:55 Re: [HACKERS] Re: [INTERFACES] retrieving varchar size