Quick Links

Re: Review: Revise parallel pg_restore's scheduling heuristic

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Review: Revise parallel pg_restore's scheduling heuristic
Date:	2009-07-19 11:28:35
Message-ID:	603c8f070907190428x38509630i26528c1db8cc5e23@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, Jul 18, 2009 at 4:41 PM, Kevin
Grittner<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
>> Performance tests to follow in a day or two.
>
> I'm looking to beg another week or so on this to run more tests. What
> I can have by the end of today is pretty limited, mostly because I
> decided it made the most sense to test this with big complex
> databases, and it just takes a fair amount of time to throw around
> that much data. (This patch didn't seem likely to make a significant
> difference on smaller databases.)

No worries. We have a limited number of people who can test these
kinds of things and who have volunteered to serve as reviewers (two
that I'm aware of, and one of those I haven't heard from lately...).
So we'll be patient. :-)

> My current plan is to test this on a web server class machine and a
> distributed application class machine. Both database types have over
> 300 tables with tables with widely ranging row counts, widths, and
> index counts.
>
> It would be hard to schedule the requisite time on our biggest web
> machines, but I assume an 8 core 64GB machine would give meaningful
> results. Any sense what numbers of parallel jobs I should use for
> tests? I would be tempted to try 1 (with the -1 switch), 8, 12, and
> 16 -- maybe keep going if 16 beats 12. My plan here would be to have
> the dump on one machine, and run pg_restore there, and push it to a
> database on another machine through the LAN on a 1Gb connection.
> (This seems most likely to be what we'd be doing in real life.) I
> would run each test with the CVS trunk tip with and without the patch
> applied. The database is currently 1.1TB.
>
> The application machine would have 2 cores and about 4GB RAM. I'm
> tempted to use Milwaukee County's database there, as it has the most
> rows per table, even though some of the counties doing a lot of
> document scanning now have bigger databases in terms of disk space.
> It's 89GB. I'd probably try job counts starting at one and going up by
> one until performance starts to drop off. (At one I would use the -1
> switch.)
>
> In all cases I was planning on using a "conversion" postgresql.conf
> file, turning off fsync, archiving, statistics, etc.
>
> Does this sound like a sane approach to testing whether this patch
> actually improves performance? Any suggestions before I start this,
> to ensure most meaningful results?

This all sounds reasonable to me, although it might be worth testing
with default settings too.

...Robert

In response to

Re: Review: Revise parallel pg_restore's scheduling heuristic at 2009-07-18 20:41:08 from Kevin Grittner

Responses

Re: Review: Revise parallel pg_restore's scheduling heuristic at 2009-07-19 13:20:01 from Andrew Dunstan
Re: Review: Revise parallel pg_restore's scheduling heuristic at 2009-07-20 16:23:33 from Kevin Grittner

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2009-07-19 11:42:34	Re: WIP patch for TODO Item: Add prompt escape to display the client and server versions
Previous Message	Robert Haas	2009-07-19 11:22:05	Re: [PATCH v3] Avoid manual shift-and-test logic in AllocSetFreeIndex