Re: Hash join on int takes 8..114 seconds

From: tv(at)fuzzy(dot)cz
To: "Andrus" <kobruleht2(at)hot(dot)ee>
Cc: "Richard Huxton" <dev(at)archonet(dot)com>, "PFC" <lists(at)peufeu(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hash join on int takes 8..114 seconds
Date: 2008-11-21 16:15:10
Message-ID: 63239.89.102.139.23.1227284110.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Just the most important points:

1) "dok" table contains 1235086 row versions in 171641 pages (with 8kB
pages this means 1.4GB MB of data), but there are 1834279 unused item
pointers (i.e. about 60% of the space is wasted)

2) "rid" table contains 3275189 roiws in 165282 (with 8kB pages this means
about 1.3GB of data), but there are 1878923 unused item pointers (i.e.
about 30% of the space is wasted)

3) don't forget to execute analyze after vacuuming (or vacuum analyze)

4) I'm not sure why the sizes reported by you (for example 2.3GB vs 1.5GB
for "doc" table) - the difference seems too large for me.

Anyway the amount of wasted rows seems significant to me - I'd try to
solve this first. Either by VACUUM FULL or by CLUSTER. The CLUSTER will
lock the table exclusively, but the results may be better (when sorting by
a well chosen index). Don't forget to run ANALYZE afterwards.

Several other things to consider:

1) Regarding the toode column - why are you using CHAR(20) when the values
are actually shorter? This may significantly increase the amount of space
required.

2) I've noticed the CPU used is Celeron, which may negatively affect the
speed of hash computation. I'd try to replace it by something faster - say
INTEGER as an artificial primary key of the "toode" table and using it as
a FK in other tables. This might improve the "Bitmap Heap Scan on rid"
part, but yes - it's just a minor improvement compared to the "Hash Join"
part of the query.

Materialized views seem like a good idea to me, but maybe I'm not seeing
something. What do you mean by "reports are different"? If there is a lot
of rows for a given product / day, then creating an aggregated table with
(product code / day) as a primary key is quite simple. It may require a
lot of disk space, but it'll remove the hash join overhead. But if the
queries are very different, then it may be difficult to build such
materialized view(s).

regards
Tomas

> PFC,
>
> thank you.
>
>> OK so vmstat says you are IO-bound, this seems logical if the same plan
>> has widely varying timings...
>>
>> Let's look at the usual suspects :
>>
>> - how many dead rows in your tables ? are your tables data, or bloat ?
>> (check vacuum verbose, etc)
>
> set search_path to firma2,public;
> vacuum verbose dok; vacuum verbose rid
>
> INFO: vacuuming "firma2.dok"
> INFO: index "dok_pkey" now contains 1235086 row versions in 9454 pages
> DETAIL: 100 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.16s/0.38u sec elapsed 0.77 sec.
> INFO: index "dok_dokumnr_idx" now contains 1235086 row versions in 9454
> pages
> DETAIL: 100 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.14s/0.40u sec elapsed 0.78 sec.
> INFO: index "dok_klient_idx" now contains 1235086 row versions in 18147
> pages
> DETAIL: 887 index row versions were removed.
> 3265 index pages have been deleted, 3033 are currently reusable.
> CPU 0.36s/0.46u sec elapsed 31.87 sec.
> INFO: index "dok_krdokumnr_idx" now contains 1235086 row versions in
> 11387
> pages
> DETAIL: 119436 index row versions were removed.
> 1716 index pages have been deleted, 1582 are currently reusable.
> CPU 0.47s/0.55u sec elapsed 63.38 sec.
> INFO: index "dok_kuupaev_idx" now contains 1235101 row versions in 10766
> pages
> DETAIL: 119436 index row versions were removed.
> 659 index pages have been deleted, 625 are currently reusable.
> CPU 0.62s/0.53u sec elapsed 40.20 sec.
> INFO: index "dok_tasudok_idx" now contains 1235104 row versions in 31348
> pages
> DETAIL: 119436 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 1.18s/1.08u sec elapsed 118.97 sec.
> INFO: index "dok_tasudok_unique_idx" now contains 99 row versions in 97
> pages
> DETAIL: 98 index row versions were removed.
> 80 index pages have been deleted, 80 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.48 sec.
> INFO: index "dok_tasumata_idx" now contains 1235116 row versions in 11663
> pages
> DETAIL: 119436 index row versions were removed.
> 5340 index pages have been deleted, 5131 are currently reusable.
> CPU 0.43s/0.56u sec elapsed 53.96 sec.
> INFO: index "dok_tellimus_idx" now contains 1235122 row versions in 11442
> pages
> DETAIL: 119436 index row versions were removed.
> 1704 index pages have been deleted, 1569 are currently reusable.
> CPU 0.45s/0.59u sec elapsed 76.91 sec.
> INFO: index "dok_yksus_pattern_idx" now contains 1235143 row versions in
> 5549 pages
> DETAIL: 119436 index row versions were removed.
> 529 index pages have been deleted, 129 are currently reusable.
> CPU 0.19s/0.46u sec elapsed 2.72 sec.
> INFO: index "dok_doktyyp" now contains 1235143 row versions in 3899 pages
> DETAIL: 119436 index row versions were removed.
> 188 index pages have been deleted, 13 are currently reusable.
> CPU 0.14s/0.44u sec elapsed 1.40 sec.
> INFO: index "dok_sihtyksus_pattern_idx" now contains 1235143 row versions
> in 5353 pages
> DETAIL: 119436 index row versions were removed.
> 286 index pages have been deleted, 5 are currently reusable.
> CPU 0.13s/0.45u sec elapsed 3.01 sec.
> INFO: "dok": removed 119436 row versions in 13707 pages
> DETAIL: CPU 0.80s/0.37u sec elapsed 14.15 sec.
> INFO: "dok": found 119436 removable, 1235085 nonremovable row versions in
> 171641 pages
> DETAIL: 2 dead row versions cannot be removed yet.
> There were 1834279 unused item pointers.
> 0 pages are entirely empty.
> CPU 6.56s/6.88u sec elapsed 450.54 sec.
> INFO: vacuuming "pg_toast.pg_toast_40595"
> INFO: index "pg_toast_40595_index" now contains 0 row versions in 1 pages
> DETAIL: 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO: "pg_toast_40595": found 0 removable, 0 nonremovable row versions in
> 0
> pages
> DETAIL: 0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO: vacuuming "firma2.rid"
> INFO: index "rid_pkey" now contains 3275197 row versions in 13959 pages
> DETAIL: 38331 index row versions were removed.
> 262 index pages have been deleted, 262 are currently reusable.
> CPU 0.42s/1.05u sec elapsed 58.56 sec.
> INFO: index "rid_dokumnr_idx" now contains 3275200 row versions in 14125
> pages
> DETAIL: 38331 index row versions were removed.
> 572 index pages have been deleted, 571 are currently reusable.
> CPU 0.49s/1.14u sec elapsed 71.57 sec.
> INFO: index "rid_inpdokumnr_idx" now contains 3275200 row versions in
> 15103
> pages
> DETAIL: 38331 index row versions were removed.
> 579 index pages have been deleted, 579 are currently reusable.
> CPU 0.66s/1.03u sec elapsed 68.38 sec.
> INFO: index "rid_toode_idx" now contains 3275224 row versions in 31094
> pages
> DETAIL: 38331 index row versions were removed.
> 2290 index pages have been deleted, 2290 are currently reusable.
> CPU 1.39s/1.58u sec elapsed 333.82 sec.
> INFO: index "rid_rtellimus_idx" now contains 3275230 row versions in 7390
> pages
> DETAIL: 18591 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.18s/0.66u sec elapsed 1.78 sec.
> INFO: index "rid_toode_pattern_idx" now contains 3275230 row versions in
> 16310 pages
> DETAIL: 17800 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.44s/1.04u sec elapsed 6.55 sec.
> INFO: "rid": removed 38331 row versions in 3090 pages
> DETAIL: CPU 0.20s/0.10u sec elapsed 5.49 sec.
> INFO: "rid": found 38331 removable, 3275189 nonremovable row versions in
> 165282 pages
> DETAIL: 0 dead row versions cannot be removed yet.
> There were 1878923 unused item pointers.
> 0 pages are entirely empty.
> CPU 5.06s/7.27u sec elapsed 607.59 sec.
>
> Query returned successfully with no result in 1058319 ms.
>
>> - what's the size of the dataset relative to the RAM ?
>
> Db size is 7417 MB
> relevant table sizes in desc by size order:
>
> 1 40595 dok 2345 MB
> 2 1214 pg_shdepend 2259 MB
> 3 40926 rid 2057 MB
> 6 1232 pg_shdepend_depender_index 795 MB
> 7 1233 pg_shdepend_reference_index 438 MB
> 8 44286 dok_tasudok_idx 245 MB
> 9 44299 rid_toode_idx 243 MB
> 10 44283 dok_klient_idx 142 MB
> 11 19103791 rid_toode_pattern_idx 127 MB
> 14 44298 rid_inpdokumnr_idx 118 MB
> 15 44297 rid_dokumnr_idx 110 MB
> 16 43573 rid_pkey 109 MB
> 18 44288 dok_tasumata_idx 91 MB
> 19 44289 dok_tellimus_idx 89 MB
> 20 44284 dok_krdokumnr_idx 89 MB
> 21 44285 dok_kuupaev_idx 84 MB
> 23 43479 dok_pkey 74 MB
> 24 44282 dok_dokumnr_idx 74 MB
> 25 19076304 rid_rtellimus_idx 58 MB
> 26 18663923 dok_yksus_pattern_idx 43 MB
> 27 18801591 dok_sihtyksus_pattern_idx 42 MB
> 29 18774881 dok_doktyyp 30 MB
> 46 40967 toode 13 MB
>
> server is HP Proliant DL320 G3
> http://h18000.www1.hp.com/products/quickspecs/12169_ca/12169_ca.HTML
> CPU is 2.93Ghz Celeron 256kb cache.
>
> Server has 2 GB RAM.
> It has SATA RAID 0,1 integrated controller (1.5Gbps) and SAMSUNG HD160JJ
> mirrored disks.
>
>> Now let's look more closely at the query :
>>
>> explain analyze
>> SELECT sum(1)
>> FROM dok JOIN rid USING (dokumnr)
>> JOIN toode USING (toode)
>> LEFT JOIN artliik using(grupp,liik)
>> WHERE rid.toode='X05' AND dok.kuupaev>='2008-09-01'
>>
>>
>> I presume doing the query without artliik changes nothing to the
>> runtime,
>> yes ?
>
> Yes. After removing artkliik from join I got response times 65 and 50
> seconds, so this does not make difference.
>
>> Your problem here is that, no matter what, postgres will have to examine
>> - all rows where dok.kuupaev>='2008-09-01',
>> - and all rows where rid.toode = 'X05'.
>> If you use dok.kuupaev>='2007-09-01' (note : 2007) it will probably have
>> to scan many, many more rows.
>
> Probably yes, since then it reads one year more sales data.
>
>> If you perform this query often you could CLUSTER rid on (toode) and dok
>> on (kuupaev), but this can screw other queries.
>
> Some reports are by sales date (dok.kuupaev) and customers.
> CLUSTER rid on (toode) slows them down. Also autovacuum cannot do
> clustering.
>
>> What is the meaning of the columns ?
>
> This is typical sales data:
>
> -- Receipt headers:
> DOK ( dokumnr INT SERIAL PRIMARY KEY,
> kuupaev DATE --- sales date
> )
>
> -- Receipt details
> RID ( dokumnr INT,
> toode CHAR(20), -- item code
> CONSTRAINT rid_dokumnr_fkey FOREIGN KEY (dokumnr) REFERENCES dok
> (dokumnr),
> CONSTRAINT rid_toode_fkey FOREIGN KEY (toode)
> REFERENCES firma2.toode (toode)
> )
>
> -- Products
> TOODE (
> toode CHAR(20) PRIMARY KEY
> )
>
>> To make this type of query faster I would tend to think about :
>
>> - denormalization (ie adding a column in one of your tables and a
>> multicolumn index)
>
> For this query it is possible to duplicate kuupaev column to rid table.
> However most of the this seems to go to scanning rid table, so I suspect
> that this will help.
>
>> - materialized views
>> - materialized summary tables (ie. summary of sales for last month, for
>> instance)
>
> There are about 1000 items and reports are different.
>
> Andrus.
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Richard Huxton 2008-11-21 16:32:50 Re: Hash join on int takes 8..114 seconds
Previous Message Andrus 2008-11-21 15:12:27 Re: Hash join on int takes 8..114 seconds