BUG #14648: counts for queries using array unnesting is incorrect

From: sebastian(dot)calbaza(at)hgdata(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #14648: counts for queries using array unnesting is incorrect
Date: 2017-05-12 13:00:33
Message-ID: 20170512130033.1796.93117@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 14648
Logged by: Sebastian Calbaza
Email address: sebastian(dot)calbaza(at)hgdata(dot)com
PostgreSQL version: 9.6.2
Operating system: Ubuntu 14.04 AWS
Description:

Below is a query that we are using to calculate some counts:
* first version of the query lacks ```unnest(ids) as id,```, but second
one has it
* ```companies``` count value is incorrect for the second one, first
query has the correct value


```
mydb=# select count(company) as available,count(distinct
matchedCompany) as matchedCompanies,count(distinct company) as companies
from
(
SELECT F.urlx as company,
CB_RDPPKLCPQDVACOMSTEFANHGDATACOM_agg.url as matchedCompany
from
(select unnest(urls) as urlx from
hg_data_discovery_2017_04_10.GroupedFirmographics
where
(
(TRUE
AND TRUE
AND revenueRangeMin >= 1
AND employeesRangeMin >= 1
AND revenueRangeMax <= 1783792664
AND employeesRangeMax <= 4999000
)
OR
FALSE
)
) as I
inner join (
select unnest(urls) as urlx from
hg_data_discovery_2017_04_10.GroupedInstallsWithoutDateSignalScore where
productId IN (562) and signalScoreId IN (1,2,3)
) as F using(urlx)
left outer join CB_RDPPKLCPQDVACOMSTEFANHGDATACOM_agg on
CB_RDPPKLCPQDVACOMSTEFANHGDATACOM_agg.url=F.urlx where TRUE and TRUE
) as P
;
available | matchedcompanies | companies
-----------+------------------+-----------
496493 | 28503 | 495799
(1 row)

Time: 7974.053 ms
mydb=# select count(id) as people, count(company) as
available,count(distinct matchedCompany) as matchedCompanies,count(distinct
company) as companies from
(
SELECT unnest(ids) as id, F.urlx as company,
CB_RDPPKLCPQDVACOMSTEFANHGDATACOM_agg.url as matchedCompany
from
(select unnest(urls) as urlx from
hg_data_discovery_2017_04_10.GroupedFirmographics
where
(
(TRUE
AND TRUE
AND revenueRangeMin >= 1
AND employeesRangeMin >= 1
AND revenueRangeMax <= 1783792664
AND employeesRangeMax <= 4999000
)
OR
FALSE
)
) as I
inner join (
select unnest(urls) as urlx from
hg_data_discovery_2017_04_10.GroupedInstallsWithoutDateSignalScore where
productId IN (562) and signalScoreId IN (1,2,3)
) as F using(urlx)
left outer join CB_RDPPKLCPQDVACOMSTEFANHGDATACOM_agg on
CB_RDPPKLCPQDVACOMSTEFANHGDATACOM_agg.url=F.urlx where TRUE and TRUE
) as P
;
people | available | matchedcompanies | companies
--------+-----------+------------------+-----------
689905 | 689905 | 28503 | 28503

```

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message K S, Sandhya (Nokia - IN/Bangalore) 2017-05-12 13:42:25 Re: Crash observed during the start of the Postgres process
Previous Message Tom Lane 2017-05-12 03:59:56 Re: [HACKERS] Concurrent ALTER SEQUENCE RESTART Regression