Re: [POC] FETCH limited by bytes.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgresql(dot)org, mkellycs(at)gmail(dot)com, ashutosh(dot)bapat(at)enterprisedb(dot)com
Subject: Re: [POC] FETCH limited by bytes.
Date: 2015-01-27 09:24:03
Message-ID: 20150127.182403.130437292.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for the comment.

The automatic way to determin the fetch_size looks become too
much for the purpose. An example of non-automatic way is a new
foreign table option like 'fetch_size' but this exposes the
inside too much... Which do you think is preferable?

Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <24503(dot)1421943472(at)sss(dot)pgh(dot)pa(dot)us>
> Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> > Hello, as the discuttion on async fetching on postgres_fdw, FETCH
> > with data-size limitation would be useful to get memory usage
> > stability of postgres_fdw.
>
> > Is such a feature and syntax could be allowed to be added?
>
> This seems like a lot of work, and frankly an incredibly ugly API,
> for a benefit that is entirely hypothetical. Have you got numbers
> showing any actual performance win for postgres_fdw?

The API is a rush work to make the path for the new parameter
(but, yes, I did too much for the purpose that use from
postgres_fdw..) and it can be any saner syntax but it's not the
time to do so yet.

The data-size limitation, any size to limit, would give
significant gain especially for small sized rows.

This patch began from the fact that it runs about twice faster
when fetch size = 10000 than 100.

http://www.postgresql.org/message-id/20150116.171849.109146500.horiguchi.kyotaro@lab.ntt.co.jp

I took exec times to get 1M rows from localhost via postgres_fdw
and it showed the following numbers.

=# SELECT a from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.75s
100 60 6.2s 6000 (0.006)
10000 60 2.7s 600000 (0.6 )
33333 60 2.2s 1999980 (2.0 )
66666 60 2.4s 3999960 (4.0 )

=# SELECT a, b, c from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 204 12 s 20400 (0.02 )
1000 204 10 s 204000 (0.2 )
10000 204 5.8s 2040000 (2 )
20000 204 5.9s 4080000 (4 )

=# SELECT a, b, d from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 1356 17 s 135600 (0.136)
1000 1356 15 s 1356000 (1.356)
1475 1356 13 s 2000100 (2.0 )
2950 1356 13 s 4000200 (4.0 )

The definitions of the environment are the following.

CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', dbname 'postgres');
CREATE USER MAPPING FOR PUBLIC SERVER sv1;
CREATE TABLE lt1 (a int, b timestamp, c text, d text);
CREATE FOREIGN TABLE ft1 (a int, b timestamp, c text, d text) SERVER sv1 OPTIONS (table_name 'lt1');
INSERT INTO lt1 (SELECT a, now(), repeat('x', 128), repeat('x', 1280) FROM generate_series(0, 999999) a);

The "avg row size" is alloced_mem/fetch_size and the alloced_mem
is the sum of HeapTuple[fetch_size] and (HEAPTUPLESIZE +
tup->t_len) for all stored tuples in the receiver side,
fetch_more_data() in postgres_fdw.

They are about 50% gain for the smaller tuple size and 25% for
the larger. They looks to be optimal at where alloced_mem is
around 2MB by the reason unknown to me. Anyway the difference
seems to be significant.

> Even if we wanted to do something like this, I strongly object to
> measuring size by heap_compute_data_size. That's not a number that users
> would normally have any direct knowledge of; nor does it have anything
> at all to do with the claimed use-case, where what you'd really need to
> measure is bytes transmitted down the wire. (The difference is not small:
> for instance, toasted values would likely still be toasted at the point
> where you're measuring.)

Sure. Finally, the attached patch #1 which does the following
things.

- Sender limits the number of tuples using the sum of the net
length of the column values to be sent, not including protocol
overhead. It is calculated in the added function
slot_compute_attr_size(), using raw length for compressed
values.

- postgres_fdw calculates fetch limit bytes by the following
formula,

MAX_FETCH_MEM - MAX_FETCH_SIZE * (estimated overhead per tuple);

The result of the patch is as follows. MAX_FETCH_MEM = 2MiB and
MAX_FETCH_SIZE = 30000.

fetch_size, avg row size(*1), time, max alloced_mem/fetch(Mbytes)
(auto) 60 2.4s 1080000 ( 1.08)
(auto) 204 7.3s 536400 ( 0.54)
(auto) 1356 15 s 430236 ( 0.43)

This is meaningfully fast but the patch looks too big and the
meaning of the new parameter is hard to understand..:(

On the other hand the cause of the displacements of alloced_mem
shown above is per-tuple overhead, the sum of which is unknown
before execution. The second patch makes FETCH accept the tuple
overhead bytes. The result seems pretty good, but I think this
might be too spcialized to this usage.

MAX_FETCH_SIZE = 30000 and MAX_FETCH_MEM = 2MiB,
max_fetch_size, avg row size(*1), time, max alloced_mem/fetch(MiBytes)
30000 60 2.3s 1080000 ( 1.0)
9932 204 5.7s 1787760 ( 1.7)
1376 1356 13 s 1847484 ( 1.8)

MAX_FETCH_SIZE = 25000 and MAX_FETCH_MEM = 1MiB,
max_fetch_size, avg row size(*1), time, max alloced_mem/fetch(MiBytes)
25000 60 2.4s 900000 ( 0.86)
4358 204 6.6s 816840 ( 0.78)
634 1356 16 s 844488 ( 0.81)

MAX_FETCH_SIZE = 10000 and MAX_FETCH_MEM = 0.5MiB,
max_fetch_size, avg row size(*1), time, max alloced_mem/fetch(MiBytes)
10000 60 2.8s 360000 ( 0.35)
2376 204 7.8s 427680 ( 0.41)
332 1356 17 s 442224 ( 0.42)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Size-limitation-feature-of-FETCH-v1.patch text/x-patch 37.6 KB
0002-Make-FETCH-can-accept-per-tuple-memory-overhead.patch text/x-patch 30.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Giuseppe Broccolo 2015-01-27 09:25:46 Re: [RFC] Incremental backup v3: incremental PoC
Previous Message Kyotaro HORIGUCHI 2015-01-27 09:16:17 Re: alter user/role CURRENT_USER