Re: Force a merge join?

From: Doug Fields <dfields-pg-general(at)pexicom(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Ian Harding" <ianh(at)tpchd(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: Force a merge join?
Date: 2002-05-18 22:19:12
Message-ID: 5.1.0.14.2.20020518174933.01f46e08@mail.pexicom.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

At 04:19 PM 5/18/2002, Tom Lane wrote:
>Doug Fields <dfields-pg-general(at)pexicom(dot)com> writes:
> > In fact, yes it does. How do I know? Very simple: I did a SELECT * INTO
> ...
> > to copy my real table to a testing table so I could refactor it. Then I
> did
> > the usual EXPLAIN ANALYZE queries, and it was using merge joins. Then, I
> > did an "ANALYZE" (which is like VACUUM ANALYZE without the slow VACUUM)
> and
> > voila - nested loops and half second queries turning into five minute
> > nightmares. Then enable_nestloop would fix the problem again after that.
>
>Could we see the usual details here? Before and after EXPLAIN ANALYZE,
>and the schemas and pg_stats rows for the tables involved.

Hi Tom,

I gave some of this information in my first post, but I will repeat it here
with the pg_stats and the exact before and after information. Thanks.

delete from pg_statistic where
starelid = (select oid from pg_class where relname =
'test_list_entries');

EXPLAIN ANALYZE SELECT 745, a.list_entry_id
FROM test_list_entries a, test_list_entries b
WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR
a.list_id=147 OR a.list_id=144)
AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR
b.list_id=434 OR b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR
b.list_id=418)
AND a.lower_email = b.lower_email;

NOTICE: QUERY PLAN:

Merge Join (cost=1926.12..1940.21 rows=247 width=140) (actual
time=448.32..521.99 rows=3674 loops=1)
-> Sort (cost=739.66..739.66 rows=176 width=72) (actual
time=437.18..447.71 rows=15859 loops=1)
-> Index Scan using test_list_id_idx, test_list_id_idx,
test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries
a (cost=0.00..733.09 rows=176 width=72) (actual time=0.04..139.32
rows=15859 loops=1)
-> Sort (cost=1186.45..1186.45 rows=280 width=68) (actual
time=11.12..13.67 rows=3783 loops=1)
-> Index Scan using test_list_id_idx, test_list_id_idx,
test_list_id_idx, test_list_id_idx, test_list_id_idx, test_list_id_idx,
test_list_id_idx, test_list_id_idx on test_list_entries
b (cost=0.00..1175.08 rows=280 width=68) (actual time=0.06..4.75 rows=573
loops=1)
Total runtime: 528.83 msec

ANALYZE;

EXPLAIN SELECT 745, a.list_entry_id
FROM test_list_entries a, test_list_entries b
WHERE (a.list_id=148 OR a.list_id=146 OR a.list_id=145 OR
a.list_id=147 OR a.list_id=144)
AND (b.list_id=247 OR b.list_id=433 OR b.list_id=249 OR
b.list_id=434 OR b.list_id=238 OR b.list_id=340 OR b.list_id=339 OR
b.list_id=418)
AND a.lower_email = b.lower_email;

NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..1176.19 rows=4 width=140)
-> Index Scan using test_list_id_idx, test_list_id_idx,
test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries
a (cost=0.00..454.33 rows=84 width=72)
-> Index Scan using test_lower_email_idx on test_list_entries
b (cost=0.00..8.52 rows=1 width=68)

EXPLAIN ANALYZE same thing...

NOTICE: QUERY PLAN:

Nested Loop (cost=0.00..1176.19 rows=4 width=140) (actual
time=11.52..388320.86 rows=3674 loops=1)
-> Index Scan using test_list_id_idx, test_list_id_idx,
test_list_id_idx, test_list_id_idx, test_list_id_idx on test_list_entries
a (cost=0.00..454.33 rows=84 width=72) (actual time=0.06..190.30
rows=15859 loops=1)
-> Index Scan using test_lower_email_idx on test_list_entries
b (cost=0.00..8.52 rows=1 width=68) (actual time=14.86..24.47 rows=0
loops=15859)
Total runtime: 388330.10 msec

select attname, null_frac, avg_width, n_distinct, most_common_freqs,
correlation from pg_stats where tablename = 'test_list_entries';
-- Note, I removed some irrelevant rows

attname | null_frac | avg_width | n_distinct
| most_common_freqs
| correlation
---------------+-----------+-----------+------------+-----------------------------------------------------------------------------------------------------------------+-------------
list_id | 0 | 4 | 184 |
{0.383667,0.115667,0.0163333,0.0126667,0.0113333,0.01,0.00966667,0.00933333,0.009,0.009}
| 0.842899
list_entry_id | 0 | 4 | -1
|
| 0.719433
lower_email | 0.387667 | 68 | -0.38239 |
{0.0156667,0.000666667,0.000666667,0.000666667,0.000666667}
| 0.00150877
(27 rows)

\d test_list_entries
-- Note I removed some irrelevant columns
Table "test_list_entries"
Column | Type | Modifiers
---------------+--------------------------+-----------
list_id | integer |
list_entry_id | integer |
lower_email | character(64) |
Indexes: test_list_entries_pkey,
test_list_id_idx,
test_lower_email_idx

Cheers,

Doug

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tim Hart 2002-05-18 22:31:10 Re: Ordering of data on calls to user defined aggregate.
Previous Message Tom Lane 2002-05-18 22:17:26 Re: another psql question