BUG #15592: Memory overuse with subquery containing unnest() and set operations (11.x regression)

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: amdmi3(at)amdmi3(dot)ru
Subject: BUG #15592: Memory overuse with subquery containing unnest() and set operations (11.x regression)
Date: 2019-01-14 14:55:20
Message-ID: 15592-cc5729f9ba02e31c@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 15592
Logged by: Dmitry Marakasov
Email address: amdmi3(at)amdmi3(dot)ru
PostgreSQL version: 11.1
Operating system: FreeBSD 12.0
Description:

Shamefully I wasn't aware of array && operator, so I've been using the
following construct to check for array intersection of two columns of text[]
type:

SELECT ... WHERE EXISTS(SELECT unnest(first) INTERSECT SELECT
unnest(second));

After upgrading from PostgreSQL 10.x to 11.x this query began to eat
inadequate amounts of memory (several gigabytes per hundred thousands rows).
I've narrowed it down to this simple case which demonstrates the problem:

CREATE TABLE test AS
SELECT generate_series(1,500000) AS id, '{a,b}'::text[] AS first,
'{a}'::text[] AS second;

SELECT DISTINCT EXISTS(SELECT unnest(first) INTERSECT SELECT unnest(second))
FROM test;

This query makes worker process eat up about 2GBs of memory. Memory usage
grows linearly during SELECT execution, afterwards the memory is freed.
Memory overuse doesn't happen on PostgreSQL 10, so this may be a regression
in 11. My original case was quite silly and fixable with using an array
operator, but I suspect that this problem may fire in legitimate cases too,
so I thought it's worth reporting it.

Additional info:

Execution plan (it's the same on 10 and 11):

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=36527.18..36527.29 rows=2 width=1) (actual
time=5005.132..5005.132 rows=1 loops=1)
Output: ((SubPlan 1))
Group Key: (SubPlan 1)
Buffers: shared hit=2048 read=4125 dirtied=4125 written=4093
-> Seq Scan on public.test (cost=0.00..35215.42 rows=524705 width=1)
(actual time=0.053..4875.718 rows=500000 loops=1)
Output: (SubPlan 1)
Buffers: shared hit=2048 read=4125 dirtied=4125 written=4093
SubPlan 1
-> HashSetOp Intersect (cost=0.00..4.54 rows=100 width=36)
(actual time=0.005..0.005 rows=1 loops=500000)
Output: "*SELECT* 1".unnest, (0)
-> Append (cost=0.00..4.04 rows=200 width=36) (actual
time=0.001..0.004 rows=3 loops=500000)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..1.52
rows=100 width=36) (actual time=0.001..0.002 rows=2 loops=500000)
Output: "*SELECT* 1".unnest, 0
-> ProjectSet (cost=0.00..0.52 rows=100
width=32) (actual time=0.001..0.001 rows=2 loops=500000)
Output: unnest(test.first)
-> Result (cost=0.00..0.01 rows=1
width=0) (actual time=0.000..0.000 rows=1 loops=500000)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..1.52
rows=100 width=36) (actual time=0.001..0.001 rows=1 loops=500000)
Output: "*SELECT* 2".unnest, 1
-> ProjectSet (cost=0.00..0.52 rows=100
width=32) (actual time=0.001..0.001 rows=1 loops=500000)
Output: unnest(test.second)
-> Result (cost=0.00..0.01 rows=1
width=0) (actual time=0.000..0.000 rows=1 loops=500000)
Planning Time: 0.185 ms
Execution Time: 5120.463 ms

I've also managed to get MemoryContextStats(TopMemoryContext) (from my
production query, no the simplified one above), here it is:
https://gist.github.com/AMDmi3/b55b2f3f6acf06cf6eab59fc043c7a0f and this
line looks suspicious:

> 344753 more child contexts containing 2824216576 total in 344753 blocks;
2735944032 free (5 chunks); 88272544 used

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2019-01-14 15:59:26 Re: psql and readline comments
Previous Message Magnus Hagander 2019-01-14 14:09:46 Re: BUG #15591: pg_receivewal does not honor replication slots