Re: 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6

From: Vladimir Borodin <root(at)simply(dot)name>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgbouncer-general(at)lists(dot)pgfoundry(dot)org
Subject: Re: 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
Date: 2016-05-31 09:06:03
Message-ID: B43C9DA5-C9C2-45B5-99BB-6A20C3DEC5A1@simply.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance


> 27 мая 2016 г., в 19:57, Vladimir Borodin <root(at)simply(dot)name> написал(а):
>
> -performance
> +hackers
>
>> 25 мая 2016 г., в 17:33, Vladimir Borodin <root(at)simply(dot)name <mailto:root(at)simply(dot)name>> написал(а):
>>
>> Hi all.
>>
>> We have found that queries through PgBouncer 1.7.2 (with transaction pooling) to local PostgreSQL are almost two times slower in 9.5.3 than in 9.4.8 on RHEL 6 hosts (all packages are updated to last versions). Meanwhile the problem can’t be reproduced i.e. on Ubuntu 14.04 (also fully-updated).
>>
>> Here is how the results look like for 9.4, 9.5 and 9.6. All are built from latest commits on yesterday in
>> * REL9_4_STABLE (a0cc89a28141595d888d8aba43163d58a1578bfb),
>> * REL9_5_STABLE (e504d915bbf352ecfc4ed335af934e799bf01053),
>> * master (6ee7fb8244560b7a3f224784b8ad2351107fa55d).
>>
>> All of them are build on the host where testing is done (with stock gcc versions). Sysctls, pgbouncer config and everything we found are the same, postgres configs are default, PGDATA is in tmpfs. All numbers are reproducible, they are stable between runs.
>>
>> Shortly:
>>
>> OS PostgreSQL version TPS Avg. latency
>> RHEL 6 9.4 44898 1.425 ms
>> RHEL 6 9.5 26199 2.443 ms
>> RHEL 6 9.5 43027 1.487 ms
>> Ubuntu 14.04 9.4 67458 0.949 ms
>> Ubuntu 14.04 9.5 64065 0.999 ms
>> Ubuntu 14.04 9.6 64350 0.995 ms
>
> The results above are not really fair, pgbouncer.ini was a bit different on Ubuntu host (application_name_add_host was disabled). Here are the right results with exactly the same configuration:
>
> OS PostgreSQL version TPS Avg. latency
> RHEL 6 9.4 44898 1.425 ms
> RHEL 6 9.5 26199 2.443 ms
> RHEL 6 9.5 43027 1.487 ms
> Ubuntu 14.04 9.4 45971 1.392 ms
> Ubuntu 14.04 9.5 40282 1.589 ms
> Ubuntu 14.04 9.6 45410 1.409 ms
>
> It can be seen that there is a regression for 9.5 in Ubuntu also, but not so significant. We first thought that the reason is 38628db8d8caff21eb6cf8d775c0b2d04cf07b9b (Add memory barriers for PgBackendStatus.st <http://pgbackendstatus.st/>_changecount protocol), but in that case the regression should also be seen in 9.6 also.
>
> There also was a bunch of changes in FE/BE communication (like 387da18874afa17156ee3af63766f17efb53c4b9 or 98a64d0bd713cb89e61bef6432befc4b7b5da59e) and that may answer the question of regression in 9.5 and normal results in 9.6. Probably the right way to find the answer is to do bisect. I’ll do it but if some more diagnostics information can help, feel free to ask about it.

Yep, bisect confirms that the first bad commit in REL9_5_STABLE is 387da18874afa17156ee3af63766f17efb53c4b9. Full output is attached.
And bisect for master branch confirms that the situation became much better after 98a64d0bd713cb89e61bef6432befc4b7b5da59e. Output is also attached.

On Ubuntu performance degradation is ~15% and on RHEL it is ~100%. I don’t know what is the cause for different numbers on RHEL and Ubuntu but certainly there is a regression when pgbouncer is connected to postgres through localhost. When I try to connect pgbouncer to postgres through unix-socket performance is constantly bad on all postgres versions.

Both servers are for testing but I can easily provide you SSH access only to Ubuntu host if necessary. I can also gather more diagnostics if needed.

>
>>
>> You could see that the difference between major versions on Ubuntu is not significant, but on RHEL 9.5 is 70% slower than 9.4 and 9.6.
>>
>> Below are more details.
>>
>> RHEL 6:
>>
>> postgres(at)pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'
>> transaction type: SELECT only
>> scaling factor: 100
>> query mode: simple
>> number of clients: 64
>> number of threads: 64
>> duration: 60 s
>> number of transactions actually processed: 2693962
>> latency average: 1.425 ms
>> tps = 44897.461518 (including connections establishing)
>> tps = 44898.763258 (excluding connections establishing)
>> postgres(at)pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg95'
>> transaction type: SELECT only
>> scaling factor: 100
>> query mode: simple
>> number of clients: 64
>> number of threads: 64
>> duration: 60 s
>> number of transactions actually processed: 1572014
>> latency average: 2.443 ms
>> tps = 26198.928627 (including connections establishing)
>> tps = 26199.803363 (excluding connections establishing)
>> postgres(at)pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg96'
>> transaction type: SELECT only
>> scaling factor: 100
>> query mode: simple
>> number of clients: 64
>> number of threads: 64
>> duration: 60 s
>> number of transactions actually processed: 2581645
>> latency average: 1.487 ms
>> tps = 43025.676995 (including connections establishing)
>> tps = 43027.038275 (excluding connections establishing)
>> postgres(at)pgload05g ~ $
>>
>> Ubuntu 14.04 (the same hardware):
>>
>> postgres(at)pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'
>> transaction type: SELECT only
>> scaling factor: 100
>> query mode: simple
>> number of clients: 64
>> number of threads: 64
>> duration: 60 s
>> number of transactions actually processed: 2758348
>> latency average: 1.392 ms
>> tps = 45970.634737 (including connections establishing)
>> tps = 45971.531098 (excluding connections establishing)
>> postgres(at)pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg95'
>> transaction type: SELECT only
>> scaling factor: 100
>> query mode: simple
>> number of clients: 64
>> number of threads: 64
>> duration: 60 s
>> number of transactions actually processed: 2417009
>> latency average: 1.589 ms
>> tps = 40282.003641 (including connections establishing)
>> tps = 40282.855938 (excluding connections establishing)
>> postgres(at)pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg96'
>> transaction type: SELECT only
>> scaling factor: 100
>> query mode: simple
>> number of clients: 64
>> number of threads: 64
>> duration: 60 s
>> number of transactions actually processed: 2724666
>> latency average: 1.409 ms
>> tps = 45409.308603 (including connections establishing)
>> tps = 45410.152406 (excluding connections establishing)
>> postgres(at)pgloadpublic02:~$
>>
>> In both tests (RHEL and Ubuntu) the bottleneck is performance of singe CPU core which is 100% consumed by PgBouncer. If pgbench connects to postgres directly I get the following (expected) numbers:
>>
>> postgres(at)pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=5432'
>> transaction type: SELECT only
>> scaling factor: 100
>> query mode: simple
>> number of clients: 64
>> number of threads: 64
>> duration: 60 s
>> number of transactions actually processed: 10010710
>> latency average: 0.384 ms
>> tps = 166835.937859 (including connections establishing)
>> tps = 166849.730224 (excluding connections establishing)
>> postgres(at)pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=5433'
>> transaction type: SELECT only
>> scaling factor: 100
>> query mode: simple
>> number of clients: 64
>> number of threads: 64
>> duration: 60 s
>> number of transactions actually processed: 13373890
>> latency average: 0.287 ms
>> tps = 222888.311289 (including connections establishing)
>> tps = 222951.470125 (excluding connections establishing)
>> postgres(at)pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=5434'
>> transaction type: SELECT only
>> scaling factor: 100
>> query mode: simple
>> number of clients: 64
>> number of threads: 64
>> duration: 60 s
>> number of transactions actually processed: 12989816
>> latency average: 0.296 ms
>> tps = 216487.458399 (including connections establishing)
>> tps = 216548.069976 (excluding connections establishing)
>> postgres(at)pgload05g ~ $
>>
>> Compilation options look almost the same:
>> # RHEL 6
>> CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -O2
>> # Ubuntu
>> CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2
>>
>> Attached are a simple script to deploy the testing environment (PgBouncer should be installed) and pgbouncer config. I could provide any other needed information like backtraces or perf reports or anything else.
>>
>> <pgbouncer.ini>
>> <deploy.sh>
>>
>> --
>> May the force be with you…
>> https://simply.name <https://simply.name/>
>
>
> --
> May the force be with you…
> https://simply.name <https://simply.name/>

--
May the force be with you…
https://simply.name

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shay Rojansky 2016-05-31 10:07:44 Binary I/O for isn extension
Previous Message Kyotaro HORIGUCHI 2016-05-31 09:02:04 Re: Parallel pg_dump's error reporting doesn't work worth squat

Browse pgsql-performance by date

  From Date Subject
Next Message Антон Бушмелев 2016-06-02 11:18:26 Re: 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6
Previous Message Jeff Janes 2016-05-30 20:05:41 Re: similarity and operator '%'