Postgress 13.x: wrong result for delete with subquery

From: s(dot)p(dot)e(at)gmx-topmail(dot)de
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Postgress 13.x: wrong result for delete with subquery
Date: 2021-01-28 12:08:21
Message-ID: trinity-1c565d44-159f-488b-a518-caf13883134f-1611835701633@3c-app-gmx-bap78
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs



 


Hallo PSQL-Team,


 


I found a dataset (attachment bug13.csv)  which produces a wrong result on postgreSQL 13 for a special delete command:


 


Table t1 and t2 contain the same dataset, every row of the data (id1, id2) is unique


So the command


  DELETE FROM t1 WHERE (id1,id2) IN (SELECT t2.id1, t2.id2 FROM t2);


should remove all lines. So it does in postgres 10 and 12.5. In Version 13.0 and 13.1 there is on line left after the command.


 


 


In detail:


 


I have tested on PostgreSQL 10 (local instalaltion on SuSE-Linux) and three docker-Images (12.5, 13.0, 13.1).


The docker container are installed from scratch


 


docker run --name psqlXX    \

           --shm-size=880MB \

           -v $PWD/dXX:/var/lib/postgresql/data \

           -v $PWD/extra:/var/extra       \

           -e POSTGRES_PASSWORD=comirnaty \

           -d postgres:XX.X


 


docker exec -it psqlXX bash

psql -U postgres < /var/extra/bug13.sql > /var/extra/XXX.out


 


bug13.sql does the following:


 



-- wrong result on postgres 13.1 (docker)

-- also tested postgres 13.0 (docker): same bug

-- also tested postgres 12.5 (docker): no bug


 


\set VERBOSITY verbose

SELECT version();


-- prepare the test-data-set, all rows are unique

DROP TABLE IF EXISTS  tstdata;


CREATE TABLE tstdata (

 id1           VARCHAR(  3),

 id2           VARCHAR( 25)

);

CREATE UNIQUE INDEX tst_idx ON tstdata (id1, id2);


COPY tstdata (id1, id2) FROM '/var/extra/bug13.csv';


-- tables for the test

DROP TABLE IF EXISTS  t1, t2;


CREATE TEMPORARY TABLE t1 (

   id1 VARCHAR( 32),

   id2 VARCHAR( 32) 

);

CREATE UNIQUE INDEX t1_idx ON t1 (id1,id2);


CREATE TEMPORARY TABLE t2 (

   id1 VARCHAR( 32),

   id2 VARCHAR( 32) 

);


-- fill the tst tables with data

INSERT INTO t1 SELECT id1,id2 from tstdata;

INSERT INTO t2 SELECT id1,id2 from tstdata;


 


 


-- here ist the test for the bug with wrong result

-- t1 and t2 contain the identical set

-- so this command should remove all entries in t1


EXPLAIN DELETE FROM t1 WHERE (id1,id2) IN (SELECT id1,id2 FROM t2);


--                                           QUERY PLAN                                          

-- ----------------------------------------------------------------------------------------------

--  Delete on t1  (cost=595.12..1201.93 rows=3602 width=12)

--    ->  Hash Join  (cost=595.12..1201.93 rows=3602 width=12)

--          Hash Cond: (((t1.id1)::text = (t2.id1)::text) AND ((t1.id2)::text = (t2.id2)::text))

--          ->  Seq Scan on t1  (cost=0.00..487.06 rows=14406 width=170)

--          ->  Hash  (cost=573.50..573.50 rows=1441 width=170)

--                ->  HashAggregate  (cost=559.09..573.50 rows=1441 width=170)

--                      Group Key: (t2.id1)::text, (t2.id2)::text

--                      ->  Seq Scan on t2  (cost=0.00..487.06 rows=14406 width=170)

DELETE FROM t1 WHERE (id1,id2) IN (SELECT id1,id2 FROM t2);


-- on version 13.x there is still one entry left

--  MED | 0000000000000010035567472 

 


SELECT * from t1;


 


-- sometimes after multiple tries: the bug disappears, after recreating t2 the bug recurs

--   - creating an index on t2 -> correct result 

--   - define t2 non temporary -> correct result

--   - analyse t2              -> correct result

-- postgres 12.5 uses the same query plan without any error


 


 


 


--    second service ----------------------------------------------------------------------

--    different query plan after analyse and the bug disappears

-- refill t1

DELETE FROM t1;

INSERT INTO t1 SELECT id1,id2 from tstdata;

ANALYSE t2;

EXPLAIN DELETE FROM t1 WHERE (id1,id2) IN (SELECT id1,id2 FROM t2);


--                                           QUERY PLAN                                          

-- ----------------------------------------------------------------------------------------------

--  Delete on t1  (cost=1506.92..2718.77 rows=7192 width=12)

--    ->  Hash Semi Join  (cost=1506.92..2718.77 rows=7192 width=12)

--          Hash Cond: (((t1.id1)::text = (t2.id1)::text) AND ((t1.id2)::text = (t2.id2)::text))

--          ->  Seq Scan on t1  (cost=0.00..972.70 rows=28770 width=170)

--          ->  Hash  (cost=808.57..808.57 rows=46557 width=36)

--                ->  Seq Scan on t2  (cost=0.00..808.57 rows=46557 width=36)

DELETE FROM t1 WHERE (id1,id2) IN (SELECT id1,id2 FROM t2);

SELECT * from t1;

 



 


Data, Command and Output  are attached to this mail.


I reduced the bug-inducing datsaset from initial 8 Mio entries to this example file. The bug is very sensitive to the exact data and disappears on the most changes.


 


I hope, that a added everything necessary for further analysis.


Best regards


 


Stephan Endres


 

Attachment Content-Type Size
unknown_filename text/html 9.5 KB
125.out application/octet-stream 1.8 KB
130.out application/octet-stream 1.9 KB
131.out application/octet-stream 1.9 KB
bug13.csv text/csv 1.3 MB
bug13.sql application/sql 1.9 KB

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2021-01-28 12:19:52 BUG #16842: pg_dump uses seek calls on pipe files: suggesting adding a flag to disable seek calls
Previous Message Ajay Chitale 2021-01-28 09:36:32 Re: BUG #16841: psql -- \d tablename , displays "Error : column c.relhasoids does not exit"