Skip site navigation (1) Skip section navigation (2)

Re: Hash join on int takes 8..114 seconds

From: "Andrus" <kobruleht2(at)hot(dot)ee>
To: "Richard Huxton" <dev(at)archonet(dot)com>,"PFC" <lists(at)peufeu(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hash join on int takes 8..114 seconds
Date: 2008-11-22 17:58:04
Message-ID: E3AFB54F617D47F4BE652E85ECC67153@andrusnotebook (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
> You could perhaps run a little check on the performance of the RAID, is
> it better than linux software RAID ?
> Does it leverage NCQ appropriately when running queries in parallel ?

I was told that this RAID is software RAID.
I have no experience what to check.
This HP server was installed 3 years ago and in this time it was not
high perfomance server.

>>> explain analyze
>>> SELECT sum(1)
>>>   FROM dok JOIN rid USING (dokumnr)
>>>   JOIN toode USING (toode)
>>>   LEFT JOIN artliik using(grupp,liik)
>>>   WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'
> By the way, note that the presence of the toode table in the query above
> is not required at all, unless you use columns of toode in your
> aggregates.

In real query, SELECT column list contains data form sales table dok (sale
date and time)
and sales detail table rid (quantity, price)
WHERE clause may contain additional filters from product table (product
category, supplier).

> Let's play with that, after all, it's friday night.

Thank you very much for great sample.
I tried to create testcase from this to match production db:

1.2 million orders
3.5 million order details
13400 products with char(20) as primary keys containing ean-13 codes mostly
3 last year data
every order has usually 1..3 detail lines
same product can appear multiple times in order
products are queried by start of code

This sample does not distribute products randomly between orders.
How to change this so that every order contains 3 (or 1..6 ) random 
I tried to use random row sample from

but in this case constant product is returned always. It seems than query 
containing randon() is executed only once.


CREATE TEMP TABLE orders (order_id INTEGER NOT NULL, order_date DATE NOT 
CREATE TEMP TABLE products (product_id CHAR(20) NOT NULL, product_name 
char(70) NOT NULL, quantity numeric(12,2) default 1);
CREATE TEMP TABLE orders_products (order_id INTEGER NOT NULL, product_id 
CHAR(20),padding1 char(70),
  id serial, price numeric(12,2) default 1 );

INSERT INTO products SELECT (n*power( 10,13))::INT8::CHAR(20),
       'product number ' || n::TEXT FROM generate_series(0,13410) AS n;

SELECT n,'2005-01-01'::date + (4000.0 * n/3500000.0 * '1 DAY'::interval) 
FROM generate_series(0,3500000/3) AS n;

SET work_mem TO 2097151; -- 1048576;

INSERT INTO orders_products SELECT
   generate_series/3 as  order_id,
   ( ((generate_series/3500000.0)*13410.0)::int*power( 
FROM generate_series(1,3500000)
where generate_series/3>0;

ALTER TABLE orders ADD PRIMARY KEY (order_id);
ALTER TABLE products ADD PRIMARY KEY (product_id);
ALTER TABLE orders_products ADD PRIMARY KEY (id);

ALTER TABLE orders_products ADD FOREIGN KEY (product_id) REFERENCES 
ALTER TABLE orders_products ADD FOREIGN KEY (order_id) REFERENCES 
orders(order_id) ON DELETE CASCADE;

CREATE INDEX orders_date ON orders( order_date );
CREATE INDEX order_product_pattern_idx ON orders_products( product_id 
bpchar_pattern_ops );

SET work_mem TO DEFAULT;

  SELECT sum(quantity*price)
 FROM   orders
JOIN orders_products USING (order_id)
JOIN products USING (product_id)
WHERE orders.order_date>='2008-01-17'
and orders_products.product_id like '130%' 

In response to


pgsql-performance by date

Next:From: AndrusDate: 2008-11-22 18:04:30
Subject: Increasing pattern index query speed
Previous:From: Glyn AstillDate: 2008-11-22 16:59:02
Subject: Re: Perc 3 DC

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group