Quick Links

Re: Limitting full join to one match

From:	Sergei Agalakov <sergei(dot)agalakov(at)gmail(dot)com>
To:	John W Higgins <wishdev(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Limitting full join to one match
Date:	2018-12-06 06:01:27
Message-ID:	0a683148-5330-bd53-c3dc-7fb953e73874@gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers

On 12/5/2018 8:30 PM, John W Higgins wrote:
>
>
> On Wed, Dec 5, 2018 at 4:34 PM Phil Endecott
> <spam_from_pgsql_lists(at)chezphil(dot)org
> <mailto:spam_from_pgsql_lists(at)chezphil(dot)org>> wrote:
>
> Dear Experts,
>
> I have a couple of tables that I want to reconcile, finding rows
> that match and places where rows are missing from one table or the
> other:
>
> ...
>
> So my question is: how can I modify my query to output only two rows,
> like this:?
>
> +------------+--------+------------+--------+
> | date | amount | date | amount |
> +------------+--------+------------+--------+
> | 2018-01-01 | 10.00 | 2018-01-01 | 10.00 |
> | 2018-02-01 | 5.00 | | |
> | | | 2018-03-01 | 8.00 |
> | 2018-04-01 | 5.00 | 2018-04-01 | 5.00 |
> | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 1
> | 2018-05-01 | 20.00 | 2018-05-01 | 20.00 | 2
> +------------+--------+------------+--------+
>
>
> Evening Phil,
>
> Window functions are your friend here. I prefer views for this stuff -
> but subqueries would work just fine.
>
> create view a_rows as (select *,
> row_number() OVER (PARTITION BY date, amount)
> AS pos from a);
> create view b_rows as (select *,
> row_number() OVER (PARTITION BY date, amount)
> AS pos from b);
>
> select
> a_rows.date,
> a_rows.amount,
> a_rows.pos,
> b_rows.date,
> b_rows.amount,
> b_rows.pos
> from
> a_rows full join b_rows using (date,amount,pos);
>
> Example here - http://sqlfiddle.com/#!17/305d6/3
>
> John
>
>
> Any suggestions anyone?
>
>
> The best I have found so far is something involving EXCEPT ALL:
>
> db=> select * from a except all select * from b;
> db=> select * from b except all select * from a;
>
> That's not ideal, though, as what I ultimately want is something
> that lists everything with its status:
>
> +------------+--------+--------+
> | date | amount | status |
> +------------+--------+--------+
> | 2018-01-01 | 10.00 | OK |
> | 2018-02-01 | 5.00 | a_only |
> | 2018-03-01 | 8.00 | b_only |
> | 2018-04-01 | 5.00 | OK |
> | 2018-05-01 | 20.00 | OK |
> | 2018-05-01 | 20.00 | OK |
> +------------+--------+--------+
>
> That would be easy enough to achieve from the JOIN.
>
>
> Thanks, Phil.
>

This question is always asked time to time.
I have found an old article with so far the best solution for big tables.
https://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:2151582681236#15393095283923

On the same test data
create table a (date date, amount money);
create table b (date date, amount money);

insert into a values ('2018-01-01', 10);
insert into a values ('2018-02-01', 5);
insert into a values ('2018-04-01', 5);
insert into a values ('2018-05-01', 20);
insert into a values ('2018-05-01', 20);
insert into b values ('2018-01-01', 10);
insert into b values ('2018-03-01', 8);
insert into b values ('2018-04-01', 5);
insert into b values ('2018-05-01', 20);
insert into b values ('2018-05-01', 20);

select tt.date,
       tt.amount,
       count(tt.src1) CNT1,
       count(tt.src2) CNT2
from
(
select a.date,
        a.amount,
        1 src1,
        null::integer src2
   from a
union all
select b.date,
        b.amount,
        null::integer src1,
        2 src2
from b
) tt
group by tt.date, tt.amount;

date amount    cnt1    cnt2
2018-01-01    $10.00     1 1
2018-02-01    $5.00 1       0
2018-03-01    $8.00      0   1
2018-04-01    $5.00      1       1
2018-05-01    $20.00   2       2

It requires a sort, so you may want to increase work_mem before
execution, and then return it back like
SET work_mem = '512MB';
... run your query
RESET work_mem;

Regards,

Sergei Agalakov

In response to

Re: Limitting full join to one match at 2018-12-06 03:30:26 from John W Higgins

Browse pgsql-general by date

	From	Date	Subject
Next Message	Ron	2018-12-06 06:48:09	Re: Limitting full join to one match
Previous Message	John W Higgins	2018-12-06 03:30:26	Re: Limitting full join to one match

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Takahashi, Ryohei	2018-12-06 06:04:28	RE: Too many logs are written on Windows (LOG: could not reserve shared memory region (addr=%p) for child %p:)
Previous Message	Michael Paquier	2018-12-06 05:43:35	Re: Use durable_unlink for .ready and .done files for WAL segment removal