plan variations: join vs. exists vs. row comparison

From: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: plan variations: join vs. exists vs. row comparison
Date: 2011-03-07 19:07:35
Message-ID: AANLkTikAH2aWpo78cW13AJDvxwgu=NsEokiTfjWQdjp3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Originally, I posted to -general but I found some time to write some
samples, and realized it's probably more of a performance question.

The original post is here:
http://archives.postgresql.org/pgsql-general/2011-03/msg00198.php

I was hoping that somebody could help me understand the differences
between three plans.
All of the plans are updating a table using a second table, and should
be logically equivalent.
Two of the plans use joins, and one uses an exists subquery.
One of the plans uses row constructors and IS NOT DISTINCT FROM. It is
this plan which has really awful performance.
Clearly it is due to the nested loop, but why would the planner choose
that approach?

I also don't understand why in the 'exists' plan the planner thinks
the index scan will provide 1019978 rows, when there are only 1000000,
but that is a lesser issue.

Here is a sample SQL file which demonstrates the issues and includes
all three variations.

begin;
create temporary table t7 (
i BIGINT NOT NULL,
k BIGINT
);

create temporary table t8 (
i BIGINT NOT NULL,
j INT
);

CREATE FUNCTION populate_t8()
RETURNS VOID
LANGUAGE SQL
AS
$$
truncate t8;
insert into t8
SELECT i, 1 from t7
ORDER BY i LIMIT 10000;

insert into t8
SELECT i, 2 from t7
WHERE i > 10000
ORDER BY i LIMIT 10000;

SELECT i, 3 from t7
WHERE i > 20000
ORDER BY i LIMIT 20000;

analyze t8;
$$;

INSERT INTO t7
select x, x + 10 from generate_series(1,1000000) as x ;
analyze t7;

select populate_t8();

explain analyze verbose
update
t7
SET
k = 1
FROM
t8
WHERE
t7.i = t8.i
AND
(
t8.j = 2
OR
t8.j = 1
);

select populate_t8();

explain analyze verbose
update
t7
SET
k = 1
WHERE
EXISTS (
SELECT 1 FROM t8
WHERE t8.i = t7.i
AND
(
t8.j = 2
OR
t8.j = 1
)
);

select populate_t8();

explain
update
t7
SET
k = 1
FROM
t8
WHERE
ROW(t7.i) IS NOT DISTINCT FROM ROW(t8.i)
AND
(
t8.j = 2
OR
t8.j = 1
);

explain analyze verbose
update
t7
SET
k = 1
FROM
t8
WHERE
ROW(t7.i) IS NOT DISTINCT FROM ROW(t8.i)
AND
(
t8.j = 2
OR
t8.j = 1
);

rollback;

--
Jon

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2011-03-07 19:25:50 Re: Performance trouble finding records through related records
Previous Message Cédric Villemain 2011-03-07 15:48:07 Re: Anyone tried Flashcache with PostgreSQL?