Re: Need help building this query

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Rihad'" <rihad(at)stream(dot)az>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Need help building this query
Date: 2012-06-21 18:40:35
Message-ID: 00e401cd4fdd$589b1e60$09d15b20$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

> -----Original Message-----
> From: pgsql-sql-owner(at)postgresql(dot)org [mailto:pgsql-sql-
> owner(at)postgresql(dot)org] On Behalf Of Rihad
> Sent: Thursday, June 21, 2012 1:49 PM
> To: pgsql-sql(at)postgresql(dot)org
> Subject: [SQL] Need help building this query
>
> Hi, folks. I currently need to join two tables that lack primary keys, and
> columns used to distinguish each record can be duplicated. I need to build
> statistics over the data in those tables. Consider this:
>
>
> TableA:
> row 1: foo: 123, bar: 456, baz: 789, amount: 10.99, date_of_op: date row
2:
> foo: 123, bar: 456, baz: 789, amount: 10.99, date_of_op: date row 3: foo:
123,
> bar: 456, baz: 789, amount: 10.99, date_of_op: date
>
> TableB:
> row 1: foo: 123, bar: 456, baz: 789, amount: 10.99, date_of_op: date
>
>
> Columns foo + bar + baz are used to distinguish a performed "operation":
> TableA.date_of_op isn't, because it can lag behind TableB.
>
> Not all different "operations" are in table B.
> Table B is just there so we know which "operations" are complete, so to
> speak (happening under external means and not under any of my control).
>
> Now, for each operation (foo+bar+baz) in table A, only *one* row should be
> matched in table B, because it only has one matching row there.
> The other two in TableA should be considered unmatched.
>
> Now the query should be able to get count(*) and sum(amount) every day
> for that day, considering that matched and unmatched operations should be
> counted separately. The report would look something like this:
>
> TableA.date_of_op TableB.date_of_op
> 2012-06-21 [empty] [count(*) and sum(amount)
> of all data in TableA for this day unmatched in TableB]
> 2012-06-21 2012-06-20 [count(*) and sum(amount) of
> all data in TableA matched in TableB for the 20-th]
> 2012-06-21 2012-06-19 [count(*) and sum(amount) of
> all data in TableA matched in TableB for the 19-th]
>
>
> Can this awkward thing be done in pure SQL, or I'd be better off using
> programming for this?
>
> Thanks, I hope I could explain this.
>

You seem to be describing a straight reconciliation between two tables. My
current means of doing this are programmatically but for the simple case
pure SQL should be doable. The main thing is that you have to distinguish
between "duplicate" records first and then match them up:

TableA Keys:

AA
AA
AA
AB
AB
AC

TableB Keys:
AA
AA
AB

First you use "ROW_NUMBER() OVER (PARTITION BY key)" to assign an integer
"sub-id" to every set of possible keys in both tables:

TableA-Sub:
AA-1
AA-2
AA-3
AB-1
AB-2
AC-1

TableB-Sub:
AA-1
AA-2
AB-1

Now, with these newly constructed key+sub-key values in place, you can
perform a simple LEFT (or possibly FULL) JOIN between tables A & B.

This makes no allowances for any of kind of desired date restriction on the
matching nor does it consider the eventual report that you wish to generate.
What this gives you is a listing of ALL rows in both tables with matched
records joined together into a single (NULL-less) row while unmatched
records will have one of the two resultant columns NULLed

SELECT tableA.subid_a, tableB.subid_b
FROM tableA FULL OUTER JOIN tableB ON (tableA.subid_a = tableB.subid_b)

Requires at least version 8.4

David J.

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Oliver d'Azevedo Christina 2012-06-21 18:50:30 Re: Need help building this query
Previous Message Rihad 2012-06-21 17:48:56 Need help building this query