Re: [HACKERS] MERGE SQL Statement for PG11

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-03-29 06:37:20
Message-ID: CABOikdMvbDNFphPufsQmX5yBTFE4AC8Bf1-Ytkj_aM9N5qTqbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 27, 2018 at 5:00 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

>
> In terms of further performance optimization, if there is just one
> WHEN AND condition and no unconditional WHEN clauses then we can add
> the WHEN AND easily to the join query.
>
> That seems like an easy thing to do for PG11
>
>
I think we need to be careful in terms of what can be pushed down to the
join, in presence of WHEN NOT MATCHED actions. If we push the WHEN AND qual
to the join then I am worried that some rows which should have been
reported "matched" and later filtered out as part of the WHEN quals, will
get reported as "not-matched", thus triggering WHEN NOT MATCHED action.

For example,

postgres=# select * from target ;
a | b
---+----
1 | 10
2 | 20
(2 rows)

postgres=# select * from source ;
a | b
---+-----
2 | 200
3 | 300
(2 rows)

postgres=# BEGIN;
BEGIN
postgres=# EXPLAIN ANALYZE MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED AND t.a < 2 THEN UPDATE SET b = s.b WHEN NOT MATCHED THEN
INSERT VALUES (s.a, -1);
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Merge on target t (cost=317.01..711.38 rows=25538 width=46) (actual
time=0.104..0.104 rows=0 loops=1)
* Tuples Inserted: 1*
Tuples Updated: 0
Tuples Deleted: 0
* Tuples Skipped: 1*
-> Merge Left Join (cost=317.01..711.38 rows=25538 width=46) (actual
time=0.071..0.074 rows=2 loops=1)
Merge Cond: (s.a = t_1.a)
-> Sort (cost=158.51..164.16 rows=2260 width=40) (actual
time=0.042..0.043 rows=2 loops=1)
Sort Key: s.a
Sort Method: quicksort Memory: 25kB
-> Seq Scan on source s (cost=0.00..32.60 rows=2260
width=40) (actual time=0.027..0.031 rows=2 loops=1)
-> Sort (cost=158.51..164.16 rows=2260 width=10) (actual
time=0.019..0.020 rows=2 loops=1)
Sort Key: t_1.a
Sort Method: quicksort Memory: 25kB
-> Seq Scan on target t_1 (cost=0.00..32.60 rows=2260
width=10) (actual time=0.012..0.014 rows=2 loops=1)
Planning Time: 0.207 ms
Execution Time: 0.199 ms
(17 rows)

postgres=# abort;
ROLLBACK

postgres=# BEGIN;
BEGIN
postgres=# EXPLAIN ANALYZE MERGE INTO target t USING source s ON t.a = s.a
AND t.a < 2 WHEN MATCHED THEN UPDATE SET b = s.b WHEN NOT MATCHED THEN
INSERT VALUES (s.a, -1);
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
Merge on target t (cost=232.74..364.14 rows=8509 width=46) (actual
time=0.128..0.128 rows=0 loops=1)
* Tuples Inserted: 2*
Tuples Updated: 0
Tuples Deleted: 0
Tuples Skipped: 0
-> Merge Right Join (cost=232.74..364.14 rows=8509 width=46) (actual
time=0.070..0.072 rows=2 loops=1)
Merge Cond: (t_1.a = s.a)
-> Sort (cost=74.23..76.11 rows=753 width=10) (actual
time=0.038..0.039 rows=1 loops=1)
Sort Key: t_1.a
Sort Method: quicksort Memory: 25kB
-> Seq Scan on target t_1 (cost=0.00..38.25 rows=753
width=10) (actual time=0.026..0.028 rows=1 loops=1)
Filter: (a < 2)
Rows Removed by Filter: 1
-> Sort (cost=158.51..164.16 rows=2260 width=40) (actual
time=0.024..0.025 rows=2 loops=1)
Sort Key: s.a
Sort Method: quicksort Memory: 25kB
-> Seq Scan on source s (cost=0.00..32.60 rows=2260
width=40) (actual time=0.014..0.017 rows=2 loops=1)
Planning Time: 0.218 ms
Execution Time: 0.234 ms
(19 rows)

postgres=# abort;
ROLLBACK

If you look at the first MERGE statement, we filter one matched source row
(2,200) using (t.a < 2) and do not take any action for that row. This
filtering happens after the RIGHT JOIN has reported it as "matched". But if
we push down the qual to the join, then the join will see that the source
row has no match and hence send that row for NOT MATCHED processing, thus
inserting it into the table again.

I am not saying there is no scope for improvement. But we need to be
careful about what can be pushed down to the join and what must be applied
after the join.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2018-03-29 06:56:38 Re: [HACKERS] MERGE SQL Statement for PG11
Previous Message Craig Ringer 2018-03-29 05:58:45 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS