Re: Performance considerations for very heavy INSERT traffic

From: Brandon Black <blblack(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance considerations for very heavy INSERT traffic
Date: 2005-09-13 03:44:35
Message-ID: 84621a605091220442775249b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 12 Sep 2005 23:07:49 -0400, Greg Stark <gsstark(at)mit(dot)edu> wrote:

>
> The WAL parameters like commit_delay and commit_siblings are a bit of a
> mystery. Nobody has done any extensive testing of them. It would be quite
> helpful if you find anything conclusive and post it. It would also be
> surprising if they had a very large effect. They almost got chopped
> recently
> because they weren't believed to be useful.
>
> You might also ponder whether you need to by issuing a commit for every
> datum.
> If you only need to commit periodically you can get much better
> throughput. I
> suppose that's the same as commit_siblings. It would be interesting to
> know if
> you can get those parameters to perform as well as batching up records
> yourself.

Ideally I'd like to commit the data seperately, as the data could contain
errors which abort the transaction, but it may come down to batching it and
coding things such that I can catch and discard the offending row and retry
the transaction if it fails (which should be fairly rare I would hope). I
was hoping that the commit_delay/commit_siblings stuff would allow me to
maintain simplistic transaction failure isolation while giving some of the
benefits of batching things up, as you've said. I have seen performance
gains with it set at 100ms and a 3-6 siblings with 8 backends running, but I
haven't been able to extensively tune these values, they were mostly random
guesses that seemed to work. My cycles of performance testing take a while,
at least a day or two per change being tested, and the differences can even
then be hard to see due to variability in the testing load (as it's not
really a static test load, but a window on reality). On top of that, with
the time it takes, I've succumbed more than once to the temptation of
tweaking more than one thing per performance run, which really muddies the
results.

> Increasing shared_buffers seems to always help, even out to half of the
> dev
> > box's ram (2G).
>
> Half should be a pessimal setting. It means virtually everything is
> buffered
> twice. Once in the kernel and once in Postgres. Effectively halving your
> memory. If that's really helping try raising it even further, to something
> like 90% of your memory. But the conventional dogma is that shared buffers
> should never be more than about 10k no matter how much memory you have.
> Let
> the kernel do the bulk of the buffering.
>
> That said it's even more mysterious in your situation. Why would a large
> shared buffers help insert-only performance much at all? My guess is that
> it's
> letting the vacuums complete quicker. Perhaps try raising work_mem?

I find it odd as well. After reading the standard advice on shared_buffers,
I had only intended on raising it slightly. But seeing ever-increasing
performance gains, I just kept tuning it upwards all the way to the 2G
limit, and saw noticeable gains every time. During at least some of the test
cycles, there was no deleting or vacuuming going on, just insert traffic. I
guessed that shared_buffers' caching strategy must have been much more
effective than the OS cache at something or other, but I don't know what
exactly. The only important read traffic that comes to mind is the index
which is both being constantly updated and constantly checked for primary
key uniqueness violations.

All of these replies here on the list (and a private one or two) have given
me a long list of things to try, and I feel confident that at least some of
them will gain me enough performance to comfortably deploy this application
in the end on somewhat reasonable hardware. Thanks to everyone here on the
list for all the suggestions, it has been very helpful in giving me
directions to go with this that I hadn't thought of before.

When I finally get all of this sorted out and working reasonably optimally,
I'll be sure to come back and report what techniques/settings did and didn't
work for this workload.

-- Brandon

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Brandon Black 2005-09-13 03:53:55 Re: Performance considerations for very heavy INSERT traffic
Previous Message Christopher Petrilli 2005-09-13 03:39:14 Re: Performance considerations for very heavy INSERT traffic