Re: Postgres with pthread

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Postgres with pthread
Date: 2017-12-21 13:25:02
Message-ID: 8c9212eb-cb6f-1cfd-9fce-84ec01390b20@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I continue experiments with my pthread prototype.
Latest results are the following:

1. I have eliminated all (I hope) calls of non-reentrant functions
(getopt, setlocale, setitimer, localtime, ...). So now parallel tests
are passed.

2. I have implemented deallocation of top memory context (at thread
exit) and cleanup of all opened file descriptors.
I have to replace several place where malloc is used with top_malloc:
allocation in top context.

3. Now my prototype is passing all regression tests now. But handling of
errors is still far from completion.

4. I have performed experiments with replacing synchronization
primitives used in Postgres with pthread analogues.
Unfortunately it has almost now influence on performance.

5. Handling large number of connections.
The maximal number of postgres connections is almost the same: 100k.
But memory footprint in case of pthreads was significantly smaller: 18Gb
vs 38Gb.
And difference in performance was much higher: 60k TPS vs . 600k TPS.
Compare it with performance for 10k clients: 1300k TPS.
It is read-only pgbench -S test with 1000 connections.
As far as pgbench doesn't allow to specify more than 1000 clients, I
spawned several instances of pgbench.

Why handling large number of connections is important?
It allows applications to access postgres directly, not using pgbouncer
or any other external connection pooling tool.
In this case an application can use prepared statements which can reduce
speed of simple queries almost twice.

Unfortunately Postgres sessions are not lightweight. Each backend
maintains its private catalog and relation caches, prepared statement
cache,...
For real database size of this caches in memory will be several
megabytes and warming this caches can take significant amount of time.
So if we really want to support large number of connections, we should
rewrite caches to be global (shared).
It will allow to save a lot of memory but add synchronization overhead.
Also at NUMA private caches may be more efficient than one global cache.

My proptotype can be found at:
git://github.com/postgrespro/postgresql.pthreads.git

--

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-12-21 13:27:19 Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl
Previous Message Andrew Dunstan 2017-12-21 13:13:29 Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl