RE: BUG #16031: Group by returns duplicate groups

From: David Raymond <David(dot)Raymond(at)tomtom(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: BUG #16031: Group by returns duplicate groups
Date: 2019-10-01 16:25:32
Message-ID: VI1PR07MB6029E7C9B73250E386088047879D0@VI1PR07MB6029.eurprd07.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

"But this is all kinds of messed up"
Sounds about right :)

Output below.

As asked earlier it's Windows 10, all the lc_* settings are "en-US", and the server encoding is UTF8.
Throughout the table there are records with pretty much every notation alphabet. Latin, Cyrillic, Greek, Arabic, Hebrew, Japanese, Mandarin, etc. etc. Which "shouldn't" matter, but I figured I'd mention it.

testing=> set enable_indexscan = off;
SET
Time: 0.536 ms
testing=> set enable_bitmapscan = off;
SET
Time: 0.225 ms
testing=> with sd as (select name, row_number() over (order by name) rnum
testing(> from big_table)
testing-> select rnum, name, encode(convert_to(name, 'SQL_ASCII'),'escape'),
testing-> name < 'DK' as lt, name = 'DK' as eq, name > 'DK' as gt
testing-> from sd
testing-> where rnum >= (select min(rnum) from sd where name='DK')
testing-> and rnum <= (select max(rnum) from sd where name='DK');
rnum | name | encode | lt | eq | gt
------------+-----------------------------------+-----------------------------------+----+----+----
27,900,023 | DK | DK | f | t | f
27,900,024 | DK | DK | f | t | f
27,900,025 | DK | DK | f | t | f
27,900,026 | DK | DK | f | t | f
27,900,027 | DK | DK | f | t | f
27,900,028 | DK | DK | f | t | f
27,900,029 | DK | DK | f | t | f
27,900,030 | DK | DK | f | t | f
27,900,031 | DK | DK | f | t | f
27,900,032 | Dk'bus Marine | Dk'bus Marine | f | f | t
27,900,033 | Dk's Auto's | Dk's Auto's | f | f | t
27,900,034 | Dk's Bar & Grill | Dk's Bar & Grill | f | f | t
27,900,035 | Dk's Barbers & Stylist | Dk's Barbers & Stylist | f | f | t
27,900,036 | Dk's Beach Boutique | Dk's Beach Boutique | f | f | t
27,900,037 | Dk's Cabinets & Countertops | Dk's Cabinets & Countertops | f | f | t
27,900,038 | Dk's Cleaning Service | Dk's Cleaning Service | f | f | t
27,900,039 | Dk's Clothing | Dk's Clothing | f | f | t
27,900,040 | Dk's Communications | Dk's Communications | f | f | t
27,900,041 | Dk's Dancewear & Fitnesswear | Dk's Dancewear & Fitnesswear | f | f | t
27,900,042 | Dk's Dancewear Boutique | Dk's Dancewear Boutique | f | f | t
27,900,043 | Dk's Discount Dance & Fitnesswear | Dk's Discount Dance & Fitnesswear | f | f | t
27,900,044 | DK's Drywall Service | DK's Drywall Service | f | f | t
27,900,045 | DK'S DUSTBUSTERS | DK'S DUSTBUSTERS | f | f | t
27,900,046 | Dk's Family Five Star Trophies | Dk's Family Five Star Trophies | f | f | t
27,900,047 | DK's Family Five Star Trophies | DK's Family Five Star Trophies | f | f | t
27,900,048 | Dk's Food Mart | Dk's Food Mart | f | f | t
27,900,049 | Dk'S Group Pte. Ltd. | Dk'S Group Pte. Ltd. | f | f | t
27,900,050 | Dk's Hair Designs | Dk's Hair Designs | f | f | t
27,900,051 | Dk's Hair Happenings | Dk's Hair Happenings | f | f | t
27,900,052 | Dk's Hair Supply | Dk's Hair Supply | f | f | t
27,900,053 | Dk's Home Decor | Dk's Home Decor | f | f | t
27,900,054 | DK's Informática | DK's Inform\303\241tica | f | f | t
27,900,055 | Dk's Janitorial | Dk's Janitorial | f | f | t
27,900,056 | DK's Liquors | DK's Liquors | f | f | t
27,900,057 | Dk's Market | Dk's Market | f | f | t
27,900,058 | Dk's Moda Masculina | Dk's Moda Masculina | f | f | t
27,900,059 | Dk's Nails And Spa | Dk's Nails And Spa | f | f | t
27,900,060 | DK's Pawn Shop | DK's Pawn Shop | f | f | t
27,900,061 | Dk's Pet Grooming | Dk's Pet Grooming | f | f | t
27,900,062 | DK's Quality Service | DK's Quality Service | f | f | t
27,900,063 | DK's Restoration | DK's Restoration | f | f | t
27,900,064 | Dk's Sports Center | Dk's Sports Center | f | f | t
27,900,065 | Dk's Statuary | Dk's Statuary | f | f | t
27,900,066 | Dk's Style Hut | Dk's Style Hut | f | f | t
27,900,067 | Dk's Temiskaming Shore Taxi | Dk's Temiskaming Shore Taxi | f | f | t
27,900,068 | Dk's Towing | Dk's Towing | f | f | t
27,900,069 | DK's Travel | DK's Travel | f | f | t
27,900,070 | Dk'Style | Dk'Style | f | f | t
27,900,071 | DK'Z Car Wash | DK'Z Car Wash | f | f | t
27,900,072 | Dk- | Dk- | t | f | f
27,900,073 | DK | DK | f | t | f
27,900,074 | DK | DK | f | t | f
27,900,075 | DK | DK | f | t | f
27,900,076 | DK | DK | f | t | f
27,900,077 | DK | DK | f | t | f
27,900,078 | DK | DK | f | t | f
27,900,079 | DK | DK | f | t | f
27,900,080 | DK | DK | f | t | f
27,900,081 | DK | DK | f | t | f
27,900,082 | DK | DK | f | t | f
27,900,083 | DK | DK | f | t | f
27,900,084 | DK | DK | f | t | f
27,900,085 | DK | DK | f | t | f
27,900,086 | DK | DK | f | t | f
27,900,087 | DK | DK | f | t | f
27,900,088 | DK | DK | f | t | f
(66 rows)

Time: 821796.036 ms (13:41.796)

testing=> explain (analyze, verbose, costs, buffers, timing, summary)
testing-> with sd as (select name, row_number() over (order by name) rnum
testing(> from big_table)
testing-> select rnum, name, encode(convert_to(name, 'SQL_ASCII'),'escape'),
testing-> name < 'DK' as lt, name = 'DK' as eq, name > 'DK' as gt
testing-> from sd
testing-> where rnum >= (select min(rnum) from sd where name='DK')
testing-> and rnum <= (select max(rnum) from sd where name='DK');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on sd (cost=30560557.18..33281469.69 rows=542825 width=559) (actual time=886056.646..898585.621 rows=66 loops=1)
Output: sd.rnum, sd.name, encode(convert_to((sd.name)::text, 'SQL_ASCII'::name), 'escape'::text), ((sd.name)::text < 'DK'::text), ((sd.name)::text = 'DK'::text), ((sd.name)::text > 'DK'::text)
Filter: ((sd.rnum >= $1) AND (sd.rnum <= $2))
Rows Removed by Filter: 108565020
Buffers: shared hit=482 read=1516916, temp read=3200588 written=2619846
CTE sd
-> WindowAgg (cost=23772525.03..25672414.07 rows=108565088 width=30) (actual time=719781.561..832442.189 rows=108565086 loops=1)
Output: big_table.name, row_number() OVER (?)
Buffers: shared hit=482 read=1516916, temp read=2033663 written=2036384
-> Sort (cost=23772525.03..24043937.75 rows=108565088 width=22) (actual time=719781.549..793477.044 rows=108565086 loops=1)
Output: big_table.name
Sort Key: big_table.name
Sort Method: external merge Disk: 3453888kB
Buffers: shared hit=482 read=1516916, temp read=2033663 written=2036384
-> Seq Scan on name_stuff.big_table (cost=0.00..2603048.88 rows=108565088 width=22) (actual time=0.010..105238.261 rows=108565086 loops=1)
Output: big_table.name
Buffers: shared hit=482 read=1516916
InitPlan 2 (returns $1)
-> Aggregate (cost=2444071.54..2444071.55 rows=1 width=8) (actual time=148300.335..148300.336 rows=1 loops=1)
Output: min(sd_1.rnum)
Buffers: temp read=431489 written=583461
-> CTE Scan on sd sd_1 (cost=0.00..2442714.48 rows=542825 width=8) (actual time=34105.882..148300.327 rows=25 loops=1)
Output: sd_1.name, sd_1.rnum
Filter: ((sd_1.name)::text = 'DK'::text)
Rows Removed by Filter: 108565061
Buffers: temp read=431489 written=583461
InitPlan 3 (returns $2)
-> Aggregate (cost=2444071.54..2444071.55 rows=1 width=8) (actual time=14707.032..14707.032 rows=1 loops=1)
Output: max(sd_2.rnum)
Buffers: temp read=583462
-> CTE Scan on sd sd_2 (cost=0.00..2442714.48 rows=542825 width=8) (actual time=3729.712..14707.025 rows=25 loops=1)
Output: sd_2.name, sd_2.rnum
Filter: ((sd_2.name)::text = 'DK'::text)
Rows Removed by Filter: 108565061
Buffers: temp read=583462
Planning Time: 0.099 ms
Execution Time: 899881.036 ms
(37 rows)

Time: 899900.240 ms (14:59.900)
testing=>

-----Original Message-----
From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Sent: Tuesday, October 1, 2019 11:21 AM
To: David Raymond <David(dot)Raymond(at)tomtom(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>; pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16031: Group by returns duplicate groups

But this is all kinds of messed up:

...

Let's see some more data from that. Do this query:

with sd as (select name, row_number() over (order by name) rnum
from big_table)
select rnum, name, encode(convert_to(name, 'SQL_ASCII'),'escape'),
name < 'DK' as lt, name = 'DK' as eq, name > 'DK' as gt
from sd
where rnum >= (select min(rnum) from sd where name='DK')
and rnum <= (select max(rnum) from sd where name='DK');

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Raymond 2019-10-01 18:34:38 RE: BUG #16031: Group by returns duplicate groups
Previous Message PG Bug reporting form 2019-10-01 16:06:26 BUG #16032: pg_basebackup when running on Windows doesn't clean up on failure correctly