Re: Let's make PostgreSQL multi-threaded

From: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To: James Addison <jay(at)jp-hosting(dot)net>
Cc: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Let's make PostgreSQL multi-threaded
Date: 2023-06-15 07:12:32
Message-ID: 36f61a71-3bbb-b7b0-0d99-db5e69715af7@garret.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15.06.2023 1:23 AM, James Addison wrote:
> On Tue, 13 Jun 2023 at 07:55, Konstantin Knizhnik<knizhnik(at)garret(dot)ru> wrote:
>>
>>
>> On 12.06.2023 3:23 PM, Pavel Borisov wrote:
>>> Is the following true or not?
>>>
>>> 1. If we switch processes to threads but leave the amount of session
>>> local variables unchanged, there would be hardly any performance gain.
>>> 2. If we move some backend's local variables into shared memory then
>>> the performance gain would be very near to what we get with threads
>>> having equal amount of session-local variables.
>>>
>>> In other words, the overall goal in principle is to gain from less
>>> memory copying wherever it doesn't add the burden of locks for
>>> concurrent variables access?
>>>
>>> Regards,
>>> Pavel Borisov,
>>> Supabase
>>>
>>>
>> IMHO both statements are not true.
>> Switching to threads will cause less context switch overhead (because
>> all threads are sharing the same memory space and so preserve TLB.
>> How big will be this advantage? In my prototype I got ~10%. But may be
>> it is possible to fin workloads when it is larger.
> Hi Konstantin - do you have code/links that you can share for the
> prototype and benchmarks used to gather those results?

Sorry, I have already shared the link:
https://github.com/postgrespro/postgresql.pthreads/

As you can see last commit was 6 years ago when I stopped work on this
project.
Why?  I already tried to explain it:
- benefits from switching to threads were not so large. May be I just
failed to fid proper workload, but is was more or less expected result,
because most of the code was not changed - it uses the same sync
primitives, the same local catalog/relation caches,..
To take all advantage of multithreadig model it is necessary to rewrite
many components, especially related with interprocess communication.
But maintaining such fork of Postgres and synchronize it with mainstream
requires too much efforts and I was not able to do it myself.

There are three different but related directions of improving current
Postgres:
1. Replacing processes with threads
2. Builtin connection pooler
3. Lightweight backends (shared catalog/relation/prepared statements caches)

The motivation for such changes are also similar:
1. Increase Postgres scalability
2. Reduce memory consumption
3. Make Postgres better fir cloud and serverless requirements

I am not sure now which one should be addressed first or them can be
done together.

Replacing static variables with thread-local is the first and may be the
easiest step.
It requires more or less mechanical changes. More challenging thing is
replacing private per-backend data structures
with shared ones (caches, file descriptors,...)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-06-15 07:15:08 Re: pg_collation.collversion for C.UTF-8
Previous Message Michael Paquier 2023-06-15 06:52:13 Fix regression tests to work with REGRESS_OPTS=--no-locale