Re: [POC] FETCH limited by bytes.

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org, mkellycs(at)gmail(dot)com, ashutosh(dot)bapat(at)enterprisedb(dot)com
Subject: Re: [POC] FETCH limited by bytes.
Date: 2015-02-04 23:06:02
Message-ID: CADkLM=eTpKYX5VOfjLr0VvfXhEZbC2UeakN=P6MXMg7S86Cdqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I applied this patch to REL9_4_STABLE, and I was able to connect to a
foreign database (redshift, actually).

the basic outline of the test is below, names changed to protect my
employment.

create extension if not exists postgres_fdw;

create server redshift_server foreign data wrapper postgres_fdw
options ( host 'some.hostname.ext', port '5439', dbname 'redacted',
fetch_size '150' );

create user mapping for public server redshift_server options ( user
'redacted_user', password 'comeonyouarekiddingright' );

create foreign table redshift_tab150 ( <colspecs> )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema' );

create foreign table redshift_tab151 ( <colspecs> )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema', fetch_size '151' );

-- i don't expect the fdw to push the aggregate, this is just a test to see
what query shows up in stv_inflight.
select count(*) from redshift_ccp150;

-- i don't expect the fdw to push the aggregate, this is just a test to see
what query shows up in stv_inflight.
select count(*) from redshift_ccp151;

For those of you that aren't familiar with Redshift, it's a columnar
database that seems to be a fork of postgres 8.cough. You can connect to it
with modern libpq programs (psql, psycopg2, etc).
Redshift has a table, stv_inflight, which serves about the same purpose as
pg_stat_activity. Redshift seems to perform better with very high fetch
sizes (10,000 is a good start), so the default foreign data wrapper didn't
perform so well.

I was able to confirm that the first query showed "FETCH 150 FROM c1" as
the query, which is normal highly unhelpful, but in this case it confirms
that tables created in redshift_server do by default use the fetch_size
option given during server creation.

I was also able to confirm that the second query showed "FETCH 151 FROM c1"
as the query, which shows that per-table overrides also work.

I'd be happy to stop here, but Kyotaro might feel differently. With this
limited patch, one could estimate the number of rows that would fit into
the desired memory block based on the row width and set fetch_size
accordingly.

But we could go further, and have a fetch_max_memory option only at the
table level, and the fdw could do that same memory / estimated_row_size
calculation and derive fetch_size that way *at table creation time*, not
query time.

Thanks to Kyotaro and Bruce Momjian for their help.

On Mon, Feb 2, 2015 at 2:27 AM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> Hmm, somehow I removed some recipients, especially the
> list. Sorry for the duplicate.
>
> -----
> Sorry, I've been back. Thank you for the comment.
>
> > Do you have any insight into where I would pass the custom row fetches
> from
> > the table struct to the scan struct?
>
> Yeah it's one simple way to tune it, if the user knows the
> appropreate value.
>
> > Last year I was working on a patch to postgres_fdw where the fetch_size
> > could be set at the table level and the server level.
> >
> > I was able to get the settings parsed and they would show up in
> > pg_foreign_table
> > and pg_foreign_servers. Unfortunately, I'm not very familiar with how
> > foreign data wrappers work, so I wasn't able to figure out how to get
> these
> > custom values passed from the PgFdwRelationInfo struct into the
> > query's PgFdwScanState
> > struct.
>
> Directly answering, the states needed to be shared among several
> stages are holded within fdw_private. Your new variable
> fpinfo->fetch_size can be read in postgresGetForeignPlan. It
> newly creates another fdw_private. You can pass your values to
> ForeignPlan making it hold the value there. Finally, you will get
> it in postgresBeginForeginScan and can set it into
> PgFdwScanState.
>
> > I bring this up only because it might be a simpler solution, in that the
> > table designer could set the fetch size very high for narrow tables, and
> > lower or default for wider tables. It's also a very clean syntax, just
> > another option on the table and/or server creation.
> >
> > My incomplete patch is attached.
>
> However, the fetch_size is not needed by planner (so far), so we
> can simply read the options in postgresBeginForeignScan() and set
> into PgFdwScanState. This runs once per exection.
>
> Finally, the attached patch will work as you intended.
>
> What do you think about this?
>
> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>
> > On Tue, Jan 27, 2015 at 4:24 AM, Kyotaro HORIGUCHI <
> > horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >
> > > Thank you for the comment.
> > >
> > > The automatic way to determin the fetch_size looks become too
> > > much for the purpose. An example of non-automatic way is a new
> > > foreign table option like 'fetch_size' but this exposes the
> > > inside too much... Which do you think is preferable?
> > >
> > > Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote
> in <
> > > 24503(dot)1421943472(at)sss(dot)pgh(dot)pa(dot)us>
> > > > Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> > > > > Hello, as the discuttion on async fetching on postgres_fdw, FETCH
> > > > > with data-size limitation would be useful to get memory usage
> > > > > stability of postgres_fdw.
> > > >
> > > > > Is such a feature and syntax could be allowed to be added?
> > > >
> > > > This seems like a lot of work, and frankly an incredibly ugly API,
> > > > for a benefit that is entirely hypothetical. Have you got numbers
> > > > showing any actual performance win for postgres_fdw?
> > >
> > > The API is a rush work to make the path for the new parameter
> > > (but, yes, I did too much for the purpose that use from
> > > postgres_fdw..) and it can be any saner syntax but it's not the
> > > time to do so yet.
> > >
> > > The data-size limitation, any size to limit, would give
> > > significant gain especially for small sized rows.
> > >
> > > This patch began from the fact that it runs about twice faster
> > > when fetch size = 10000 than 100.
> > >
> > >
> > >
> http://www.postgresql.org/message-id/20150116.171849.109146500.horiguchi.kyotaro@lab.ntt.co.jp
> > >
> > > I took exec times to get 1M rows from localhost via postgres_fdw
> > > and it showed the following numbers.
> > >
> > > =# SELECT a from ft1;
> > > fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
> > > (local) 0.75s
> > > 100 60 6.2s 6000 (0.006)
> > > 10000 60 2.7s 600000 (0.6 )
> > > 33333 60 2.2s 1999980 (2.0 )
> > > 66666 60 2.4s 3999960 (4.0 )
> > >
> > > =# SELECT a, b, c from ft1;
> > > fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
> > > (local) 0.8s
> > > 100 204 12 s 20400 (0.02 )
> > > 1000 204 10 s 204000 (0.2 )
> > > 10000 204 5.8s 2040000 (2 )
> > > 20000 204 5.9s 4080000 (4 )
> > >
> > > =# SELECT a, b, d from ft1;
> > > fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
> > > (local) 0.8s
> > > 100 1356 17 s 135600 (0.136)
> > > 1000 1356 15 s 1356000 (1.356)
> > > 1475 1356 13 s 2000100 (2.0 )
> > > 2950 1356 13 s 4000200 (4.0 )
> > >
> > > The definitions of the environment are the following.
> > >
> > > CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host
> > > 'localhost', dbname 'postgres');
> > > CREATE USER MAPPING FOR PUBLIC SERVER sv1;
> > > CREATE TABLE lt1 (a int, b timestamp, c text, d text);
> > > CREATE FOREIGN TABLE ft1 (a int, b timestamp, c text, d text) SERVER
> sv1
> > > OPTIONS (table_name 'lt1');
> > > INSERT INTO lt1 (SELECT a, now(), repeat('x', 128), repeat('x', 1280)
> FROM
> > > generate_series(0, 999999) a);
> > >
> > > The "avg row size" is alloced_mem/fetch_size and the alloced_mem
> > > is the sum of HeapTuple[fetch_size] and (HEAPTUPLESIZE +
> > > tup->t_len) for all stored tuples in the receiver side,
> > > fetch_more_data() in postgres_fdw.
> > >
> > > They are about 50% gain for the smaller tuple size and 25% for
> > > the larger. They looks to be optimal at where alloced_mem is
> > > around 2MB by the reason unknown to me. Anyway the difference
> > > seems to be significant.
> > >
> > > > Even if we wanted to do something like this, I strongly object to
> > > > measuring size by heap_compute_data_size. That's not a number that
> users
> > > > would normally have any direct knowledge of; nor does it have
> anything
> > > > at all to do with the claimed use-case, where what you'd really need
> to
> > > > measure is bytes transmitted down the wire. (The difference is not
> > > small:
> > > > for instance, toasted values would likely still be toasted at the
> point
> > > > where you're measuring.)
> > >
> > > Sure. Finally, the attached patch #1 which does the following
> > > things.
> > >
> > > - Sender limits the number of tuples using the sum of the net
> > > length of the column values to be sent, not including protocol
> > > overhead. It is calculated in the added function
> > > slot_compute_attr_size(), using raw length for compressed
> > > values.
> > >
> > > - postgres_fdw calculates fetch limit bytes by the following
> > > formula,
> > >
> > > MAX_FETCH_MEM - MAX_FETCH_SIZE * (estimated overhead per tuple);
> > >
> > > The result of the patch is as follows. MAX_FETCH_MEM = 2MiB and
> > > MAX_FETCH_SIZE = 30000.
> > >
> > > fetch_size, avg row size(*1), time, max alloced_mem/fetch(Mbytes)
> > > (auto) 60 2.4s 1080000 ( 1.08)
> > > (auto) 204 7.3s 536400 ( 0.54)
> > > (auto) 1356 15 s 430236 ( 0.43)
> > >
> > > This is meaningfully fast but the patch looks too big and the
> > > meaning of the new parameter is hard to understand..:(
> > >
> > >
> > > On the other hand the cause of the displacements of alloced_mem
> > > shown above is per-tuple overhead, the sum of which is unknown
> > > before execution. The second patch makes FETCH accept the tuple
> > > overhead bytes. The result seems pretty good, but I think this
> > > might be too spcialized to this usage.
> > >
> > > MAX_FETCH_SIZE = 30000 and MAX_FETCH_MEM = 2MiB,
> > > max_fetch_size, avg row size(*1), time, max
> > > alloced_mem/fetch(MiBytes)
> > > 30000 60 2.3s 1080000 ( 1.0)
> > > 9932 204 5.7s 1787760 ( 1.7)
> > > 1376 1356 13 s 1847484 ( 1.8)
> > >
> > > MAX_FETCH_SIZE = 25000 and MAX_FETCH_MEM = 1MiB,
> > > max_fetch_size, avg row size(*1), time, max
> > > alloced_mem/fetch(MiBytes)
> > > 25000 60 2.4s 900000 ( 0.86)
> > > 4358 204 6.6s 816840 ( 0.78)
> > > 634 1356 16 s 844488 ( 0.81)
> > >
> > > MAX_FETCH_SIZE = 10000 and MAX_FETCH_MEM = 0.5MiB,
> > > max_fetch_size, avg row size(*1), time, max
> > > alloced_mem/fetch(MiBytes)
> > > 10000 60 2.8s 360000 ( 0.35)
> > > 2376 204 7.8s 427680 ( 0.41)
> > > 332 1356 17 s 442224 ( 0.42)
> > >
> > > regards,
> > >
> > > --
> > > Kyotaro Horiguchi
> > > NTT Open Source Software Center
> > >
> > >
> > > --
> > > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> > > To make changes to your subscription:
> > > http://www.postgresql.org/mailpref/pgsql-hackers
> > >
> > >
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-02-04 23:53:43 Re: binworld and install-binworld targets - was Re: Release note bloat is getting out of hand
Previous Message Michael Paquier 2015-02-04 22:10:18 Re: Unnecessary pointer-NULL checks in pgp-pgsql.c