Re: support for MERGE

From: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Daniel Westermann <dwe(at)dbi-services(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Subject: Re: support for MERGE
Date: 2021-11-15 21:01:25
Message-ID: 202111152101.3bg56wj5j4a2@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-Nov-14, Amit Langote wrote:

> The only problem caused by the code block that follows the buggy if
> statement unconditionally executing is wasted cycles. Fortunately,
> there's no correctness issue, because rootRelInfo is the same as the
> input result relation in the cases where the latter is not partitioned
> and there'd be no map to convert the tuple, so the block is basically
> a no-op. I was afraid about the case where the input relation is a
> regular inheritance parent, though apparently we don't support MERGE
> in that case to begin with.

Well, I noticed that if we simply remove the ERROR that prevents that
case, it works fine as far as I can tell. Example (partly cribbed from
the documentation):

CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
);
CREATE TABLE measurement_y2006m02 (
CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )
) INHERITS (measurement);
CREATE TABLE measurement_y2006m03 (
CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )
) INHERITS (measurement);
CREATE TABLE measurement_y2007m01 (
filler text,
peaktemp int,
logdate date not null,
city_id int not null,
unitsales int
CHECK ( logdate >= DATE '2007-01-01' AND logdate < DATE '2007-02-01')
);
ALTER TABLE measurement_y2007m01 DROP COLUMN filler;
ALTER TABLE measurement_y2007m01 INHERIT measurement;

CREATE OR REPLACE FUNCTION measurement_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
IF ( NEW.logdate >= DATE '2006-02-01' AND
NEW.logdate < DATE '2006-03-01' ) THEN
INSERT INTO measurement_y2006m02 VALUES (NEW.*);
ELSIF ( NEW.logdate >= DATE '2006-03-01' AND
NEW.logdate < DATE '2006-04-01' ) THEN
INSERT INTO measurement_y2006m03 VALUES (NEW.*);
ELSIF ( NEW.logdate >= DATE '2007-01-01' AND
NEW.logdate < DATE '2007-02-01' ) THEN
INSERT INTO measurement_y2007m01 (city_id, logdate, peaktemp, unitsales)
VALUES (NEW.*);
ELSE
RAISE EXCEPTION 'Date out of range. Fix the measurement_insert_trigger() function!';
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql ;
CREATE TRIGGER insert_measurement_trigger
BEFORE INSERT ON measurement
FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger();
INSERT INTO measurement VALUES (1, '2006-02-10', 35, 10);
INSERT INTO measurement VALUES (1, '2006-02-16', 45, 20);
INSERT INTO measurement VALUES (1, '2006-03-17', 25, 10);
INSERT INTO measurement VALUES (1, '2006-03-27', 15, 40);
INSERT INTO measurement VALUES (1, '2007-01-15', 10, 10);
INSERT INTO measurement VALUES (1, '2007-01-17', 10, 10);

alvherre=# select tableoid::regclass, * from measurement order by city_id, logdate;
tableoid | city_id | logdate | peaktemp | unitsales
----------------------+---------+------------+----------+-----------
measurement_y2006m02 | 1 | 2006-02-10 | 35 | 10
measurement_y2006m02 | 1 | 2006-02-16 | 45 | 20
measurement_y2006m03 | 1 | 2006-03-17 | 25 | 10
measurement_y2006m03 | 1 | 2006-03-27 | 15 | 40
measurement_y2007m01 | 1 | 2007-01-15 | 10 | 10
measurement_y2007m01 | 1 | 2007-01-17 | 10 | 10

CREATE TABLE new_measurement (LIKE measurement);
INSERT INTO new_measurement VALUES (1, '2006-03-01', 20, 10);
INSERT INTO new_measurement VALUES (1, '2006-02-16', 50, 10);
INSERT INTO new_measurement VALUES (2, '2006-02-10', 20, 20);
INSERT INTO new_measurement VALUES (1, '2006-03-27', NULL, NULL);
INSERT INTO new_measurement VALUES (1, '2007-01-17', NULL, NULL);
INSERT INTO new_measurement VALUES (1, '2007-01-15', 5, NULL);
INSERT INTO new_measurement VALUES (1, '2007-01-16', 10, 10);

MERGE into measurement m
USING new_measurement nm ON
(m.city_id = nm.city_id and m.logdate=nm.logdate)
WHEN MATCHED AND nm.peaktemp IS NULL THEN DELETE
WHEN MATCHED THEN UPDATE
SET peaktemp = greatest(m.peaktemp, nm.peaktemp),
unitsales = m.unitsales + coalesce(nm.unitsales, 0)
WHEN NOT MATCHED THEN INSERT
(city_id, logdate, peaktemp, unitsales)
VALUES (city_id, logdate, peaktemp, unitsales);

alvherre=# select tableoid::regclass, * from measurement order by city_id, logdate;
tableoid | city_id | logdate | peaktemp | unitsales
----------------------+---------+------------+----------+-----------
measurement_y2006m02 | 1 | 2006-02-10 | 35 | 10
measurement_y2006m02 | 1 | 2006-02-16 | 50 | 30
measurement_y2006m03 | 1 | 2006-03-01 | 20 | 10
measurement_y2006m03 | 1 | 2006-03-17 | 25 | 10
measurement_y2007m01 | 1 | 2007-01-15 | 10 | 10
measurement_y2007m01 | 1 | 2007-01-16 | 10 | 10
measurement_y2006m02 | 2 | 2006-02-10 | 20 | 20

Even the DELETE case worked correctly (I mean, it deletes from the right
partition).

So I'm considering adding this case to the regression tests and remove
the limitation. If anybody wants to try some more sophisticated
examples, I'll welcome proposed additions to the regression tests.

(Don't get me wrong -- I don't think this is terribly useful
functionality. I just think that if the code is already prepared to
handle it, we may as well let it.)

One thing I didn't quite like is that the count of affected tuples seems
off, but IIRC that's already the case for trigger-redirected tuples in
legacy-style partitioning, so I'm not too worried about that.

--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-11-15 21:03:53 Re: Improving psql's \password command
Previous Message Stephen Frost 2021-11-15 20:58:57 Re: Commitfest 2021-11 Patch Triage - Part 2