Re: Increasing pattern index query speed

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Andrus <kobruleht2(at)hot(dot)ee>, Richard Huxton <dev(at)archonet(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Increasing pattern index query speed
Date: 2008-11-24 22:50:56
Message-ID: BDFBB77C9E07BE4A984DAAE981D19F961ACA1F175A@EXVMBX018-1.exch018.msoutlookonline.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Andrus,

My first thought on the query where a pattern being faster than the query with an exact value is that the planner does not have good enough statistics on that column. Without looking at the explain plans further, I would suggest trying something simple. The fact that it is fasster on 8.3 but slower on 8.1 may have to do with changes between versions, or may simply be due to luck in the statistics sampling.

See if increasing the statistics target on that column significantly does anything:

EXPLAIN (your query);
ALTER TABLE orders_products ALTER COLUMN product_id SET STATISTICS 2000;
ANALYZE orders_products;
EXPLAIN (your query);

2000 is simply a guess of mine for a value much larger than the default. This will generally make query planning slower but the system will have a lot more data about that column and the distribution of data in it. This should help stabilize the query performance.

If this has an effect, the query plans will change.

Your question below really boils down to something more simple:
--Why is the most optimal query plan not chosen? This is usually due to either insufficient statistics or quirks in how the query planner works on a specific data set or with certain configuration options.

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Andrus
Sent: Monday, November 24, 2008 12:34 PM
To: Richard Huxton
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Increasing pattern index query speed

Richard,

> These are the same but the times are different. I'd be very surprised if
> you can reproduce these times reliably.

I re-tried today again and got same results: in production database pattern
query is many times slower that equality query.
toode and rid base contain only single product starting with 99000010
So both queries should scan exactly same numbers of rows.

> Can I give you some wider-ranging suggestions Andrus?
> 1. Fix the vacuuming issue in your hash-join question.

I have ran VACUUM FULL VERBOSE ANALYSE and set max_fsm_pages=150000
So issue is fixed before those tests.

> 2. Monitor the system to make sure you know if/when disk activity is high.

I optimized this system. Now there are short (some seconds) sales queries
about after every 5 - 300 seconds which cause few disk activity and add few
new rows to some tables.
I havent seen that this activity affects to this test result.

> 3. *Then* start to profile individual queries and look into their plans.
> Change the queries one at a time and monitor again.

How to change pattern matching query to faster ?

Andrus.

Btw.

I tried to reproduce this big difference in test server in 8.3 using sample
data script below and got big difference but in opposite direction.

explain analyze SELECT sum(1)
FROM orders
JOIN orders_products USING (order_id)
JOIN products USING (product_id)
WHERE orders.order_date>'2006-01-01' and ...

different where clauses produce different results:

AND orders_products.product_id = '3370000000000000' -- 880 .. 926 ms
AND orders_products.product_id like '3370000000000000%' -- 41 ..98 ms

So patter index is 10 .. 20 times (!) faster always.
No idea why.

Test data creation script:

begin;
CREATE OR REPLACE FUNCTION Counter() RETURNS int IMMUTABLE AS
$_$
SELECT 3500000;
$_$ LANGUAGE SQL;

CREATE TEMP TABLE orders (order_id INTEGER NOT NULL, order_date DATE NOT
NULL);
CREATE TEMP TABLE products (product_id CHAR(20) NOT NULL, product_name
char(70) NOT NULL, quantity numeric(12,2) default 1);
CREATE TEMP TABLE orders_products (order_id INTEGER NOT NULL, product_id
CHAR(20),
id serial, price numeric(12,2) default 1 );

INSERT INTO products SELECT (n*power( 10,13))::INT8::CHAR(20),
'product number ' || n::TEXT FROM generate_series(0,13410) AS n;

INSERT INTO orders
SELECT n,'2005-01-01'::date + (4000.0 * n/Counter() * '1 DAY'::interval)
FROM generate_series(0, Counter()/3 ) AS n;

SET work_mem TO 2097151;

INSERT INTO orders_products SELECT
generate_series/3 as order_id,
( (1+ (generate_series % 13410))*power( 10,13))::INT8::CHAR(20) AS
product_id
FROM generate_series(1, Counter());

ALTER TABLE orders ADD PRIMARY KEY (order_id);
ALTER TABLE products ADD PRIMARY KEY (product_id);
ALTER TABLE orders_products ADD PRIMARY KEY (id);

ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES
products(product_id);
ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES
orders(order_id) ON DELETE CASCADE;

CREATE INDEX orders_date ON orders( order_date );
CREATE INDEX order_product_pattern_idx ON orders_products( product_id
bpchar_pattern_ops );

COMMIT;
SET work_mem TO DEFAULT;
ANALYZE;

--
Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message PFC 2008-11-24 22:58:50 Re: limit clause produces wrong query plan
Previous Message Chris 2008-11-24 22:19:55 Re: limit clause produces wrong query plan