Documentation of EXCEPT ALL may have a bug

From: PG Doc comments form <noreply(at)postgresql(dot)org>
To: pgsql-docs(at)lists(dot)postgresql(dot)org
Cc: jao(at)geophile(dot)com
Subject: Documentation of EXCEPT ALL may have a bug
Date: 2018-02-09 17:48:31
Message-ID: 151819851166.9466.6740951677204542727@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/10/static/queries-union.html
Description:

I believe that the documented behavior of EXCEPT is not in agreement
with Postgres behavior (I'm using Postgres 9.5). The documents say:

EXCEPT returns all rows that are in the result of query1 but not
in the result of query2. (This is sometimes called the difference
between two queries.) Again, duplicates are eliminated unless
EXCEPT ALL is used.

Here is a test script:

drop table if exists t;
drop table if exists u;

create table t(x int);
create table u(x int);

insert into t values (1), (2), (2), (3), (3);
insert into u values (1), (2);

select * from t except select * from u;
select * from t except all select * from u;

And here is the output:

DROP TABLE
DROP TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 5
INSERT 0 2

x
---
3
(1 row)

x
---
3
3
2
(3 rows)

The output from EXCEPT matches the documented behavior.

The output from EXCEPT ALL makes sense to me, but I think it is at
odds with the documentation: "EXCEPT returns all rows that are in the
result of query1 but not in the result of query2." This general
statement is then modified by the discussion of EXCEPT ALL. That first
sentence does not deal with duplicates in the input tables, and just
discusses set membership. Each occurrence of (3) in query1 is
therefore kept. For EXCEPT, the duplicates are eliminated (yielding
the output [3]), and EXCEPT ALL should therefore yield [3, 3]. In the
EXCEPT ALL case, both occurrences of (2) in query1 should be
eliminated by the one occurrence in query2. I think this is a fair
interpretation based on the wording.

To match the observed behavior, I think that the description of EXCEPT
ALL needs to be modified to something like this:

EXCEPT ALL returns those rows of query1 in excess of matching rows in
query2, as well as rows of query1 that have no match in query2.

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Daniel Gustafsson 2018-02-09 22:43:26 Document WaitForBackgroundWorkerShutdown and fix bgworker typo
Previous Message Tom Lane 2018-02-08 15:42:34 Re: Text operators "~<=~ ~<~ ~>=~ ~>~" not documented