statement_cost_limit for regression testing.

From: "Ryan Bradetich" <rbradetich(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: statement_cost_limit for regression testing.
Date: 2008-08-29 02:57:10
Message-ID: e739902b0808281957idf882a5n8ee268a198f97cf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

Sorry for the new thread on this topic, I did not have a copy in my inbox I
could replay to :(

I am not sure of the status of the patch, but I did read through the thread
at:
http://archives.postgresql.org/pgsql-hackers/2008-08/msg00054.php

I just wanted to throw out another possible use for this GUC. There maybe
a better way to
solve this problem, but I believe this patch would be useful for regression
testing.

Here is the problem I ran into when regression testing the hash index on the
unsigned integer type
and how I could like to use the statement_cost_limit parameter:

Regression testing steps:

1. Create the table:
CREATE TABLE hash_i4_heap (seqno uint4, random uint4);

2. Create the hash index:
CREATE INDEX hash_i4_index ON hash_i4_heap USING hash (random
uint4_ops);

3. Load data into the hash_i4_heap table (data is from PostgreSQL regression
suit).
COPY hash_i4_heap FROM
'/home/rbrad/src/PostgreSQL/src/test/regress/data/hash.data';

4. Perform query:
SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random = 843938989;
seqno | random
-------+-----------
15 | 843938989
(1 row)

This would pass the regression testing, but we did not actually test the
hash index here:

EXPLAIN SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random =
843938989;
QUERY PLAN
--------------------------------------------------------------
Seq Scan on hash_i4_heap (cost=0.00..137.00 rows=1 width=8)
Filter: (random = 843938989)
(2 rows)

5. Attempt to force an index scan:
SET enable_seqscan = off;

EXPLAIN SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random =
843938989;
QUERY PLAN

----------------------------------------------------------------------------
Seq Scan on hash_i4_heap (cost=100000000.00..100000137.00 rows=1
width=8)
Filter: (random = 843938989)
(2 rows)

6. Still uses an sequential scan. But this query would have still passed
my regression tests. Try the statement_cost_limit:
SET statement_cost_limit = 99999999;

EXPLAIN SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random =
843938989;
ERROR: execution plan is too expensive: 100000137.000000

SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random = 843938989;
ERROR: execution plan is too expensive: 100000137.000000

7. This is good because finally, my regression test failed since the
sequential scan cost bonus is larger then the statement_cost_limit.
For those interested, the reason it failed to use the hash index is
because I did not cast the hash_i4_heap.random value to an uint4 type.

EXPLAIN SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random =
843938989::uint4;
QUERY PLAN

----------------------------------------------------------------------------------
Index Scan using hash_i4_index on hash_i4_heap (cost=0.00..8.27
rows=1 width=8)
Index Cond: (random = '843938989'::uint4)
(2 rows)

The issue is fixed in my regression tests for the unsigned integer
types, but it would be nice for the regression tests to fail
in the future when the index is not used.

I looked at how the main PostgreSQL regression tests handle this problem and
as far as I can tell they do not.
Maybe this is not a likely problem, but it seems we do not have a good way
to ensure the indexes are actually being used during regression testing.

Thanks,

- Ryan

P.S. There appears to be a bug in the statement_cost_limit1.patch:

SET statement_cost_limit = 0;
ERROR: 0 is outside the valid range for parameter "statement_cost_limit"
(100 .. 2147483647)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2008-08-29 03:18:06 Re: Proposal: new border setting in psql
Previous Message ITAGAKI Takahiro 2008-08-29 02:49:23 Re: Auto-explain patch