Re: Review: Revise parallel pg_restore's scheduling heuristic

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Review: Revise parallel pg_restore's scheduling heuristic
Date: 2009-07-19 11:28:35
Message-ID: 603c8f070907190428x38509630i26528c1db8cc5e23@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 18, 2009 at 4:41 PM, Kevin
Grittner<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
>> Performance tests to follow in a day or two.
>
> I'm looking to beg another week or so on this to run more tests.  What
> I can have by the end of today is pretty limited, mostly because I
> decided it made the most sense to test this with big complex
> databases, and it just takes a fair amount of time to throw around
> that much data.  (This patch didn't seem likely to make a significant
> difference on smaller databases.)

No worries. We have a limited number of people who can test these
kinds of things and who have volunteered to serve as reviewers (two
that I'm aware of, and one of those I haven't heard from lately...).
So we'll be patient. :-)

> My current plan is to test this on a web server class machine and a
> distributed application class machine.  Both database types have over
> 300 tables with tables with widely ranging row counts, widths, and
> index counts.
>
> It would be hard to schedule the requisite time on our biggest web
> machines, but I assume an 8 core 64GB machine would give meaningful
> results.  Any sense what numbers of parallel jobs I should use for
> tests?  I would be tempted to try 1 (with the -1 switch), 8, 12, and
> 16 -- maybe keep going if 16 beats 12.  My plan here would be to have
> the dump on one machine, and run pg_restore there, and push it to a
> database on another machine through the LAN on a 1Gb connection.
> (This seems most likely to be what we'd be doing in real life.)  I
> would run each test with the CVS trunk tip with and without the patch
> applied.  The database is currently 1.1TB.
>
> The application machine would have 2 cores and about 4GB RAM.  I'm
> tempted to use Milwaukee County's database there, as it has the most
> rows per table, even though some of the counties doing a lot of
> document scanning now have bigger databases in terms of disk space.
> It's 89GB. I'd probably try job counts starting at one and going up by
> one until performance starts to drop off.  (At one I would use the -1
> switch.)
>
> In all cases I was planning on using a "conversion" postgresql.conf
> file, turning off fsync, archiving, statistics, etc.
>
> Does this sound like a sane approach to testing whether this patch
> actually improves performance?  Any suggestions before I start this,
> to ensure most meaningful results?

This all sounds reasonable to me, although it might be worth testing
with default settings too.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-07-19 11:42:34 Re: WIP patch for TODO Item: Add prompt escape to display the client and server versions
Previous Message Robert Haas 2009-07-19 11:22:05 Re: [PATCH v3] Avoid manual shift-and-test logic in AllocSetFreeIndex