Re: SQL/MED - file_fdw

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Noah Misch" <noah(at)leadboat(dot)com>
Cc: "Itagaki Takahiro" <itagaki(dot)takahiro(at)gmail(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, <hanada(at)metrosystems(dot)co(dot)jp>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SQL/MED - file_fdw
Date: 2011-02-13 18:41:11
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Sat, Feb 12, 2011 at 03:42:17PM -0600, Kevin Grittner wrote:

>> Do you see any reason that COPY FROM should be significantly
>> *faster* with the patch?
> No. Up to, say, 0.5% wouldn't be too surprising, but 8.4% is
> surprising.
> What is the uncertainty of that figure?

With a few more samples, it's not that high. It's hard to dodge
around the maintenance tasks on this machine to get good numbers, so
I can't really just set something up to run overnight to get numbers
in which I can have complete confidence, but (without putting
statistical probabilities around it) I feel very safe in saying
there isn't a performance *degradation* with the patch. I got four
restores of of the 90GB data with the patch and four without. I
made sure it was during windows without any maintenance running, did
a fresh initdb for each run, and made sure that the disk areas were
the same for each run. The times for each version were pretty
tightly clustered except for each having one (slow) outlier.

If you ignore the outlier for each, there is *no overlap* between
the two sets -- the slowest of the non-outlier patched times is
faster than the fastest non-patched time.

With the patch, compared to without -- best time is 9.8% faster,
average time without the outliers is 6.9% faster, average time
including outliers is 4.3% faster, outlier is 0.8% faster.

Even with just four samples each, since I was careful to minimize
distorting factors, that seems like plenty to have confidence that
there is no performance *degradation* from the patch. If we want to
claim some particular performance *gain* from it, I would need to
arrange a dedicated machine and script maybe 100 runs each way to be
willing to offer a number for public consumption.

real 17m24.171s
real 16m52.892s
real 16m40.624s
real 16m41.700s

real 15m56.249s
real 15m47.001s
real 15m3.018s
real 17m16.157s

Since you said that a cursory test, or no test at all, should be
good enough given the low risk of performance regression, I didn't
book a machine and script a large test run, but if anyone feels
that's justified, I can arrange something.


In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2011-02-13 18:41:25 Re: Extensions vs PGXS' MODULE_PATHNAME handling
Previous Message Dimitri Fontaine 2011-02-13 18:39:23 Re: Debian readline/libedit breakage