Re: IndexJoin memory problem using spgist and boxes

From: Anton Dignös <dignoes(at)inf(dot)unibz(dot)it>
To: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: IndexJoin memory problem using spgist and boxes
Date: 2018-03-18 19:56:31
Message-ID: CALNdv1gfKVidXz3EQBY5EcEAnNTxELM1LM6q+2rd3szKxwXhPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

attached is the patch that uses two memory contexts.
One for calling the inner consistent function,
and a new one for keeping the traversal memory of the inner consistent function.

I run some test to compare the memory footprints. I report the total maximum
memory usage (sum of all children) of the per-query memory context
that is the parent of all memory
contexts used. The test datasets are described below.

TEST | HEAD | previous patch | patch V2
------+-------+----------------+----------
T1 | 3.4GB | 98kB | 81kB
T2 | 3.4GB | 17MB | 17MB
T3 | 3.5GB | 17MB | 17MB
T4 | 7GB | 17MB | 17MB
T5 | 8GB | 34MB | 25MB

T1: 1M x 1M tuples with relatively few overlaps
T2: as T1, but with 1 tuple in the outer relation for which the entire
index reports matches
T3: as T1, but with 10 tuples in the outer relation for which the
entire index reports matches
T4: as T3, but the outer relation has double the number of tuples
T5: as T4, but the inner relation has double the number of tuples

TEST dataset creation queries (executed incrementally)

T1:
-- create table r with 1M tuples
CREATE TABLE r AS SELECT 1 i, box(point(generate_series,
generate_series), point(generate_series+10, generate_series+10)) FROM
generate_series(1, 1000000);
-- create table s with 1M tuples
CREATE TABLE s AS SELECT 1 i, box(point(generate_series,
generate_series), point(generate_series+10, generate_series+10)) FROM
generate_series(1, 1000000) ORDER BY random(); -- random sort just
speeds up index creation
CREATE INDEX s_idx ON s USING spgist(box);

T2:
-- inserts a tuple for which the entire index is a match
INSERT INTO r VALUES (2, box(point(1, 1), point(1000000, 1000000)));

T3:
-- inserts 9 more tuples as in T2.

T4:
-- doubles the outer relations
INSERT INTO r SELECT * FROM r;

T5:
-- doubles the inner relation
INSERT INTO s SELECT * FROM s;

The new patch is a bit better in terms of memory by using the two
memory contexts.
I also checked the query times using explain analyze, both patches
have approximately the same runtime,
but run 5-6 times faster than HEAD.

Best regards,
Anton

Attachment Content-Type Size
spgist-idxscan-mem-fix-V2.patch application/octet-stream 2.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2018-03-18 20:42:14 jsonb nesting level edge case
Previous Message David Fetter 2018-03-18 19:36:53 Re: Implementing SQL ASSERTION