Re: Async execution of postgres_fdw.

From: Matt Kelly <mkellycs(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: ashutosh(dot)bapat(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Async execution of postgres_fdw.
Date: 2015-01-21 05:22:34
Message-ID: CA+KcUkg4cvDLf4v0M9_rVv_ZuAsG1oDHPj_YvczJa6w2nSkwNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm trying to compare v5 and v6 in my laptop right now. Apparently my
laptop is quite a bit faster than your machine because the tests complete
in roughly 3.3 seconds.

I added more data and didn't see anything other than noise. (Then again
the queries were dominated by the disk sort so I should retry with larger
work_mem). I'll try it again when I have more time to play with it. I
suspect the benefits would be more clear over a network.

Larger than default work_mem yes, but I think one of the prime use case for
the fdw is for more warehouse style situations (PostgresXL style use
cases). In those cases, work_mem might reasonably be set to 1GB. Then
even if you have 10KB rows you can fetch a million rows and still be using
less than work_mem. A simpler change would be to vary it with respect to
work_mem.

Half baked idea: I know its the wrong time in the execution phase, but if
you are using remote estimates for cost there should also be a row width
estimate which I believe is based from pg_statistic and its mean column
width.

Its actually a pity that there is no way to set fetch sizes based on "give
me as many tuples as will fit in less than x amount of memory". Because
that is almost always exactly what you want. Even when writing application
code, I've never actually wanted precisely 10,000 rows; I've always wanted
"a reasonable size chunk that could fit into memory" and then backed my way
into how many rows I wanted. If we were to extend FETCH to support syntax
like: FETCH FORWARD '10MB' FROM ...; then we would eliminate the need
estimate the value on the fly.

The async stuff, however, is a huge improvement over the last time I played
with the fdw. The two active postgres processes were easily consuming a
core and half of CPU. I think its not worth tying these two things
together. Its probably worth it to make these two separate discussions and
separate patches.

- Matt Kelly

*Just sanity checking myself: Shutting down the server, applying the
different patch, 'make clean install' in postgres_fdw, and then restarting
the server should obviously be sufficient to make sure its running the new
code because that is all linked at runtime, right?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-01-21 05:43:53 Re: [Pgbuildfarm-members] [HACKERS] Reducing buildfarm disk usage: remove temp installs when done
Previous Message Michael Paquier 2015-01-21 05:14:45 Re: Dereferenced pointers checked as NULL in btree_utils_var.c