Re: Performance With Joins on Large Tables

From: "Joshua Marsh" <icub3d(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Terje Elde" <terje(at)elde(dot)net>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, pgsql-performance(at)postgresql(dot)org, jim(at)nasby(dot)net
Subject: Re: Performance With Joins on Large Tables
Date: 2006-09-14 14:18:30
Message-ID: 38242de90609140718w2d10cfdfueb36bb53285c6cfa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>
> Wow, that correlation value is *way* away from order. If they were
> really in exact order by dsiacctno then I'd expect to see 1.0 in
> that column. Can you take another look at the tables and confirm
> the ordering? Does the correlation change if you do an ANALYZE on the
> tables? (Some small change is to be expected due to random sampling,
> but this is way off.)
>
> regards, tom lane

Thanks for pointing that out. Generally we load the tables via COPY and
then never touch the data. Because of the slowdown, I have been updating
tuples. I reloaded it from scratch, set enable_seqscan=off and
random_access_age=4 and I got the results I was looking for:

data=# analyze view_505;
ANALYZE
data=# analyze r3s169;
ANALYZE
data=# select tablename, attname, n_distinct, avg_width, correlation from
pg_stats where tablename in ('view_505', 'r3s169') and attname =
'dsiacctno';
tablename | attname | n_distinct | avg_width | correlation
-----------+-----------+------------+-----------+-------------
view_505 | dsiacctno | -1 | 13 | 1
r3s169 | dsiacctno | 42140 | 13 | 1
(2 rows)

data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN
r3s169 s on v.dsiacctno = s.dsiacctno;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Merge Join (cost=0.00..20099712.79 rows=285153952 width=11)
Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
-> Index Scan using view_505_dsiacctno on view_505 v (cost=
0.00..5147252.74 rows=112282976 width=20)
-> Index Scan using r3s169_dsiacctno on r3s169 s
(cost=0.00..8256331.47rows=285153952 width=17)
(4 rows)

Thanks for you help everyone.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2006-09-14 14:21:50 Re: High CPU Load
Previous Message Scott Marlowe 2006-09-14 14:17:24 Re: High CPU Load