Quick Links

Re: Performance issue after upgrading from 9.4 to 9.6

From:	Naytro Naytro <naytro(at)googlemail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Performance issue after upgrading from 9.4 to 9.6
Date:	2017-03-09 22:38:46
Message-ID:	CAHgVxQEZfP+UK8O1wqq+tn-yDsbKqBAhRMG8EhNCS+z_fXLLHw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

2017-03-09 20:19 GMT+01:00 Andres Freund <andres(at)anarazel(dot)de>:

> Hi,
>
> On 2017-03-09 13:47:35 +0100, Naytro Naytro wrote:
> > We are having some performance issues after we upgraded to newest
> > version of PostgreSQL, before it everything was fast and smooth.
> >
> > Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we
> > upgraded to 9.6.2 with no improvement.
> >
> > Some information about our setup: Freebsd, Solaris (SmartOS), simple
> > master-slave using streaming replication.
>
> Which node is on which of those, and where is the high load?
>
>
High load in only on slaves, FreeBSD (master+slave) and Solaris (only
slaves)

>
> > Problem:
> > Very high system CPU when master is streaming replication data, CPU
> > goes up to 77%. Only one process is generating this load, it's a
> > postgresql startup process. When I attached a truss to this process I
> > saw a lot o read calls with almost the same number of errors (EAGAIN).
>
> Hm. Just to clarify: The load is on the *receiving* side, in the startup
> process? Because the load doesn't quite look that way...
>
>
Yes

>
> > read(6,0x7fffffffa0c7,1) ERR#35 'Resource temporarily unavailable'
> >
> > Descriptor 6 is a pipe
>
> That's presumably a latches internal pipe. Could you redo that
> truss/strace with timestamps attached? Does truss show signals
> received? The above profile would e.g. make a lot more sense if not. Is
> the wal receiver sending signals?
>
>
Truss from Solaris: http://pastebin.com/WajedZ8Y and FreeBSD:
http://pastebin.com/DB5iT8na
FreeBSD truss should show signals by default

Dtrace from solaris: http://pastebin.com/u03uVKbr

>
> > Read call try to read one byte over and over, I looked up to source
> > code and I think this file is responsible for this behavior
> > src/backend/storage/ipc/latch.c. There was no such file in 9.4.
>
> It was "just" moved (and expanded), used to be at
> src/backend/port/unix_latch.c.
>
> There normally shouldn't be that much "latch traffic" in the startup
> process, we'd expect to block from within WaitForWALToBecomeAvailable().
>
> Hm. Any chance you've configured a recovery_min_apply_delay? Although
> I'd expect more timestamp calls in that case.
>
>
No, we don't have this option configured

>
> Greetings,
>
> Andres Freund
>

In response to

Re: Performance issue after upgrading from 9.4 to 9.6 at 2017-03-09 19:19:49 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Christensen	2017-03-09 22:39:04	Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure
Previous Message	David Rowley	2017-03-09 22:31:23	Re: Parallel Bitmap scans a bit broken