Re: Increasing pattern index query speed

From: Mario Weilguni <mweilguni(at)sime(dot)com>
To: Andrus <kobruleht2(at)hot(dot)ee>
Cc: Richard Huxton <dev(at)archonet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Increasing pattern index query speed
Date: 2008-11-26 14:15:29
Message-ID: 492D5A01.3060109@sime.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Andrus schrieb:
> Richard,
>
>> These are the same but the times are different. I'd be very surprised if
>> you can reproduce these times reliably.
>
> I re-tried today again and got same results: in production database
> pattern query is many times slower that equality query.
> toode and rid base contain only single product starting with 99000010
> So both queries should scan exactly same numbers of rows.
>
>> Can I give you some wider-ranging suggestions Andrus?
>> 1. Fix the vacuuming issue in your hash-join question.
>
> I have ran VACUUM FULL VERBOSE ANALYSE and set max_fsm_pages=150000
> So issue is fixed before those tests.
>
>> 2. Monitor the system to make sure you know if/when disk activity is
>> high.
>
> I optimized this system. Now there are short (some seconds) sales
> queries about after every 5 - 300 seconds which cause few disk
> activity and add few new rows to some tables.
> I havent seen that this activity affects to this test result.
>
>> 3. *Then* start to profile individual queries and look into their plans.
>> Change the queries one at a time and monitor again.
>
> How to change pattern matching query to faster ?
>
> Andrus.
>
> Btw.
>
> I tried to reproduce this big difference in test server in 8.3 using
> sample data script below and got big difference but in opposite
> direction.
>
> explain analyze SELECT sum(1)
> FROM orders
> JOIN orders_products USING (order_id)
> JOIN products USING (product_id)
> WHERE orders.order_date>'2006-01-01' and ...
>
> different where clauses produce different results:
>
> AND orders_products.product_id = '3370000000000000' -- 880 .. 926 ms
> AND orders_products.product_id like '3370000000000000%' -- 41 ..98 ms
>
> So patter index is 10 .. 20 times (!) faster always.
> No idea why.
>
> Test data creation script:
>
> begin;
> CREATE OR REPLACE FUNCTION Counter() RETURNS int IMMUTABLE AS
> $_$
> SELECT 3500000;
> $_$ LANGUAGE SQL;
>
> CREATE TEMP TABLE orders (order_id INTEGER NOT NULL, order_date DATE
> NOT NULL);
> CREATE TEMP TABLE products (product_id CHAR(20) NOT NULL, product_name
> char(70) NOT NULL, quantity numeric(12,2) default 1);
> CREATE TEMP TABLE orders_products (order_id INTEGER NOT NULL,
> product_id CHAR(20),
> id serial, price numeric(12,2) default 1 );
>
> INSERT INTO products SELECT (n*power( 10,13))::INT8::CHAR(20),
> 'product number ' || n::TEXT FROM generate_series(0,13410) AS n;
>
> INSERT INTO orders
> SELECT n,'2005-01-01'::date + (4000.0 * n/Counter() * '1 DAY'::interval)
> FROM generate_series(0, Counter()/3 ) AS n;
>
> SET work_mem TO 2097151;
>
> INSERT INTO orders_products SELECT
> generate_series/3 as order_id,
> ( (1+ (generate_series % 13410))*power( 10,13))::INT8::CHAR(20) AS
> product_id
> FROM generate_series(1, Counter());
>
> ALTER TABLE orders ADD PRIMARY KEY (order_id);
> ALTER TABLE products ADD PRIMARY KEY (product_id);
> ALTER TABLE orders_products ADD PRIMARY KEY (id);
>
> ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES
> products(product_id);
> ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES
> orders(order_id) ON DELETE CASCADE;
>
> CREATE INDEX orders_date ON orders( order_date );
> CREATE INDEX order_product_pattern_idx ON orders_products( product_id
> bpchar_pattern_ops );
>
> COMMIT;
> SET work_mem TO DEFAULT;
> ANALYZE;
>
No wonder that = compares bad, you created the index this way:
CREATE INDEX order_product_pattern_idx ON orders_products( product_id
bpchar_pattern_ops );
why not:
CREATE INDEX order_product_pattern_idx ON orders_products( product_id);

explain analyze SELECT sum(1)
FROM orders
JOIN orders_products USING (order_id)
JOIN products USING (product_id)
WHERE orders.order_date>'2006-01-01'
AND orders_products.product_id = '3370000000000000';

QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=3013.68..3013.69 rows=1 width=0) (actual
time=8.206..8.207 rows=1 loops=1)
-> Nested Loop (cost=10.83..3013.21 rows=185 width=0) (actual
time=2.095..7.962 rows=189 loops=1)
-> Index Scan using products_pkey on products
(cost=0.00..8.27 rows=1 width=18) (actual time=0.036..0.038 rows=1 loops=1)
Index Cond: ((product_id)::text = '3370000000000000'::text)
-> Nested Loop (cost=10.83..3003.09 rows=185 width=18)
(actual time=2.052..7.474 rows=189 loops=1)
-> Bitmap Heap Scan on orders_products
(cost=10.83..949.68 rows=253 width=22) (actual time=0.161..0.817
rows=261 loops=1)
Recheck Cond: ((product_id)::text =
'3370000000000000'::text)
-> Bitmap Index Scan on foo (cost=0.00..10.76
rows=253 width=0) (actual time=0.116..0.116 rows=261 loops=1)
Index Cond: ((product_id)::text =
'3370000000000000'::text)
-> Index Scan using orders_pkey on orders
(cost=0.00..8.10 rows=1 width=4) (actual time=0.020..0.021 rows=1 loops=261)
Index Cond: (orders.order_id =
orders_products.order_id)
Filter: (orders.order_date > '2006-01-01'::date)
Total runtime: 8.268 ms

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Chavdar Kopoev 2008-11-26 15:01:40 many to many performance
Previous Message Richard Huxton 2008-11-26 10:41:12 Re: Increasing pattern index query speed