Re: ExclusiveLock on extension of relation with huge shared_buffers

From: Borodin Vladimir <root(at)simply(dot)name>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ExclusiveLock on extension of relation with huge shared_buffers
Date: 2014-10-24 08:40:42
Message-ID: 7DC73570-1A80-470D-8B7B-46E2713C41D4@simply.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have taken some backtraces (they are attached to the letter) of two processes with such command:
pid=17981; while true; do date; gdb -batch -e back /usr/pgsql-9.4/bin/postgres $pid; echo; echo; echo; echo; sleep 0.1; done

Process 17981 was holding the lock for a long time - http://pastie.org/9671931.
And process 13886 was waiting for lock (in different time and from different blocker actually but I don’t think it is really important) - http://pastie.org/9671939.

As I can see, 17981 is actually waiting for LWLock on BufFreelistLock in StrategyGetBuffer function, freelist.c:134 while holding exclusive lock on relation. I will try to increase NUM_BUFFER_PARTITIONS (on read-only load it also gave us some performance boost) and write the result in this thread.

And sorry for spam. This letter is mostly to let you know that you should not read my first long message carefully.

23 окт. 2014 г., в 21:09, Borodin Vladimir <root(at)simply(dot)name> написал(а):

> Hi all.
>
> Some time ago I wrote [0] on pgsql-performance@ about having problems with ExclusiveLock on extending relation while checkpointing on system with large shared_buffers. That time the problem was fixed with getting shared_buffers much smaller.
>
> Right now I am testing 9.4beta3 with large shared_buffers (100GB of 128 GB RAM) and huge pages under writing (insert only) load and I am having the same issue. The reason I do it is that such configuration shows much better performance (than 4GB shared_buffers and all other RAM for page cache) on our read-only load profile (some kind of 2,5x better).
>
> The symptoms are well described in the first letter but I will repeat shortly. Most of the time processor spends in system, pg_locks shows that many backends are waiting because of ExclusiveLock on extending some relation. In the log I can see lots of such lines:
>
> < 2014-10-23 18:03:39.287 MSK >LOG: process 5426 acquired ExclusiveLock on extension of relation 111669 of database 110866 after 3421.279 ms
> < 2014-10-23 18:03:39.501 MSK >LOG: process 2904 still waiting for ExclusiveLock on extension of relation 111714 of database 110866 after 1000.053 ms
>
> I started to dig to find the root of the problem. The CPU spike does not happen on the start of checkpoint or at the end. It happens one or several times during checkpoint is happening. Since it is not production environment I have written SystemTap script that logs all checkpoint events and tracks timings of doing something under ExclusiveLock. The script is attached to the letter (checkpoint_markers.stp). Also in attaches you can find output of this script (stap.out) and output of dstat (dstat.log) to see the system resource consumption.
>
> From stap.out you can see that there are some strange spikes of timings spent between LockRelationForExtension and UnlockRelationForExtension from the same pid and for the same relation. And from dstat.log you can see that at the same time there are spikes of cpu spent in system, "perf top" says that most of the time is spent in _spin_lock_irq at this time.
>
> I have looked at src/backend/access/heap/hio.c:RelationGetBufferForTuple and further until smgr* functions to see what is happening under this lock but haven’t found any potential problems except one. Relation extending is done by backend under exclusive lock and it is extended by only one page. Aren’t there any plans for making yet another background process for extending relations after reaching some threashold?
>
> I’m pretty sure that I have missed something (for example, I still can’t explain why this is not happening on system with small shared_buffers). What kind of debug can I turn on or where else should I look? Any ideas would be really appropriate. Below are some details about the system.
>
>
>
>
> I am running PostgreSQL 9.4beta3 installed from RPM packages (compiled with —enable-debug and —enable-dtrace) on RHEL 6.5. Host is running with the following CPU (32 cores) and memory:
>
> root(at)xdb01d ~ # fgrep -m1 'model name' /proc/cpuinfo
> model name : Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
> root(at)xdb01d ~ # free -m
> total used free shared buffers cached
> Mem: 129027 128293 734 0 21 17345
> -/+ buffers/cache: 110927 18100
> Swap: 16378 0 16378
> root(at)xdb01d ~ #
>
> PGDATA lives on RAID6 array of 8 ssd-disks with ext4 (pg_xlog is on separate RAID10 array mounted with nobarrier). Right now PGDATA takes a bit more than 100G.
>
> When checkpoint starts I/O load on the system becomes much higher than before checkpoint starts. That is happening because of bgwriter doesn’t really do the job (although bgwriter settings are really aggressive):
>
> xdb01d/postgres # select * from pg_stat_bgwriter;
> -[ RECORD 1 ]---------+------------------------------
> checkpoints_timed | 0
> checkpoints_req | 19
> checkpoint_write_time | 22769904
> checkpoint_sync_time | 199006
> buffers_checkpoint | 85116980
> buffers_clean | 92815
> maxwritten_clean | 80
> buffers_backend | 3743898
> buffers_backend_fsync | 0
> buffers_alloc | 13366074
> stats_reset | 2014-10-23 10:59:17.123677+04
>
> Time: 10.451 ms
> xdb01d/postgres #
>
> < 2014-10-23 18:54:13.913 MSK >LOG: checkpoint starting: xlog
> < 2014-10-23 19:23:36.432 MSK >LOG: checkpoint complete: wrote 4586752 buffers (35.0%); 0 transaction log file(s) added, 0 removed, 1881 recycled; write=1761.595 s, sync=0.722 s, total=1762.518 s; sync files=172, longest=0.162 s, average=0.004 s
>
> < 2014-10-23 19:28:47.526 MSK >LOG: checkpoint starting: xlog
> < 2014-10-23 19:59:29.860 MSK >LOG: checkpoint complete: wrote 4735580 buffers (36.1%); 0 transaction log file(s) added, 0 removed, 2138 recycled; write=1841.799 s, sync=0.328 s, total=1842.334 s; sync files=179, longest=0.124 s, average=0.001 s
>
> The configuration for postgres is also in attached file.
>
>
> [0] http://www.postgresql.org/message-id/466D72D2-68EC-4FF4-93B8-43B687E7B705@simply.name
>
> <checkpoint_markers.stp>
> <stap.out>
> <dstat.log>
> <postgresql.conf>
>
> P.S. Since this letter is more the question about why relation is extended by one page I’m writing the letter on hackers list but not performance. But of course it can be moved to performance list if someone finds it correct.
>
> --
> Vladimir
>
>
>
>

--
Vladimir

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-10-24 08:48:03 Re: Function array_agg(array)
Previous Message Michael Paquier 2014-10-24 08:36:02 Re: make pg_controldata accept "-D dirname"