Re: Very long execution time of "select nextval('..');"

From: mljv(at)planwerk6(dot)de
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Very long execution time of "select nextval('..');"
Date: 2008-01-28 11:19:11
Message-ID: 200801281219.11836.mljv@planwerk6.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Greg,

first fo all: thanks a lot. i think i understood most of your comments, but -
of course - have now more questions than before :-)

Am Montag, 28. Januar 2008 01:07 schrieb Greg Smith:
> On Sun, 27 Jan 2008, mljv(at)planwerk6(dot)de wrote:
> > ok, at the moment i got some traffic and my load is at 1.5. But now with
> > logging the timestamp I have seen that the long durations are quite
> > regular at intervals of 10 minutes.
>
> Sure sounds like checkpoints. You should turn on the checkpoint warning
> feature so it always triggers and see if the long queries completely just
> after the checkpoints finish.

i read und tried to understand all about checkpoints.
i have set
checkpoint_warning=3600
and restartet Postgresql but i dont see any checkpoint warnings inside my logs
with
log_min_messages = info
log_min_error_statement = warning

so i think a checkpoint warning should pop up frequently as i have
checkpoint_segments = 10
checkpoint_timeout = 300
so at least each five minutes a warning should be shown, but it isn't.

> Notes on that and what you can do to
> possibly improve checkpoint behavior are at
> http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm The
> early parts of that mostly refer to 8.2 but 8.1 is basically the same in
> this area.

so this what i have done to tune checkpoints. Please let me know if it sounds
reasonable. It list all parameters which in my opinion affect write
performance and which dont have their default value

fsync = on
wal_buffers = 16
bgwriter_delay = 200
bgwriter_lru_percent = 10.0
bgwriter_lru_maxpages = 100
bgwriter_all_percent = 5
bgwriter_all_maxpages = 200

checkpoint_timeout = 300
checkpoint_segments = 10
checkpoint_warning = 3600

shared_buffers = 30000
work_mem = 10240
maintenance_work_mem = 163840
max_fsm_pages = 500000
effective_cache_size = 180000
random_page_cost = 3

autovacuum = on
autovacuum_naptime = 3000

stats_start_collector = on
stats_command_string = off
stats_block_level = on
stats_row_level = on
stats_reset_on_server_start = on

but after a restart i still got lots of queries above 500ms (my new value for
log_min_duration) and still some of them are so simple as select nextval.
(Logs are shortend for your convinience):

11:53:48 duration: 14473.594 ms EXECUTE <unnamed> [PREPARE: select nextval
11:53:48 duration: 12527.846 ms EXECUTE <unnamed> [PREPARE: select nextval
11:53:48 duration: 12488.325 ms EXECUTE <unnamed> [PREPARE: select nextval
11:53:48 duration: 10300.745 ms EXECUTE <unnamed> [PREPARE: select nextval
11:53:48 duration: 4638.686 ms EXECUTE <unnamed> [PREPARE: select nextval
11:53:48 duration: 4012.015 ms EXECUTE <unnamed> [PREPARE: select nextval
11:53:48 duration: 2087.131 ms EXECUTE <unnamed> [PREPARE: select nextval
11:53:48 duration: 11669.099 ms EXECUTE <unnamed> [PREPARE: select nextval
11:54:14 duration: 2007.563 ms EXECUTE <unnamed> [PREPARE: select nextval

> > but what in hell can make nextval take so long? even if checkpointing is
> > badly configured.
>
> You're in a situation where your amount of RAM far exceeds your disk I/O
> capabilities. Brutally bad checkpoints are easy to encounter in that
> setup. Linux by default will use 10% of RAM to hold writes. At
> checkpoint time, that entire Linux buffer cache has to be cleared of
> database writes on top of what's written by the checkpoint itself. How
> long do you think it takes to write >800MB of database data with a
> significant random-access component to it when your disk is a simple
> RAID-1? 20 seconds is not out of the question.

ok. But is it something you really recommend? I fear changing such values as i
do not understand their impact.

> You may want to significantly reduce the size of the Linux write buffer
> and see if that helps.
> http://www.westnet.com/~gsmith/content/linux-pdflush.htm goes over theory
> and suggestions here.
>
> > I always thought that nextval is one of the fastest operations.
> > So if it takes 500 ms, fine. things like this can always happen, but 20
> > seconds sounds more like a hardware failure. But i can't see any.
>
> Just about everything gets blocked behind the worse checkpoint spikes.
> The thing that kind of bothers me about your case is that I'd expect other
> queries would also be blocked and you'd have a whole set of >250ms ones
> lined up just after the checkpoint is done. That you're only reporting
> issues with nextval makes me wonder if there isn't some other locking
> driving the main behavior, perhaps something that just gets worse at
> checkpoint time rather than being directly caused by it.

i have many queries far above 250ms. I just choosed to report nextval
statements only because other statements are rather complicated and i would
just get an answer about "explain analyze" which is not my problem.

I appreciate your help very much. thanks a lot.

kind regards,
janning

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Paul Houselander 2008-01-28 11:37:03 Error after upgrade
Previous Message Harald Fuchs 2008-01-28 10:34:32 Re: A select DISTINCT query? - followup Q