Re: Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

From: "Mengxing Liu" <liu-mx15(at)mails(dot)tsinghua(dot)edu(dot)cn>
To:
Cc: "Alvaro Herrera" <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
Date: 2017-06-03 06:51:54
Message-ID: 24761d91.21818.15c6cb97cfe.Coremail.liu-mx15@mails.tsinghua.edu.cn
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Messages-----
> From: "Kevin Grittner" <kgrittn(at)gmail(dot)com>
> Sent Time: 2017-06-03 01:44:16 (Saturday)
> To: "Alvaro Herrera" <alvherre(at)2ndquadrant(dot)com>
> Cc: "Mengxing Liu" <liu-mx15(at)mails(dot)tsinghua(dot)edu(dot)cn>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
> Subject: Re: Re: Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
>
> > Mengxing Liu wrote:
>
> >> The CPU utilization of CheckForSerializableConflictOut/In is
> >> 0.71%/0.69%.
>
> How many cores were on the system used for this test? The paper
> specifically said that they didn't see performance degradation on
> the PostgreSQL implementation until 32 concurrent connections with
> 24 or more cores. The goal here is to fix *scaling* problems. Be
> sure you are testing at an appropriate scale. (The more sockets,
> cores, and clients, the better, I think.)
>
>

I used 15 cores for server and 50 clients.
I tried 30 cores. But the CPU utilization is about 45%~70%.
How can we distinguish where the problem is? Is disk I/O or Lock ?

> On Fri, Jun 2, 2017 at 10:08 AM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
>
> > Kevin mentioned during PGCon that there's a paper by some group in
> > Sydney that developed a benchmark on which this scalability
> > problem showed up very prominently.
>
> https://pdfs.semanticscholar.org/6c4a/e427e6f392b7dec782b7a60516f0283af1f5.pdf
>
> "[...] we see a much better scalability of pgSSI than the
> corresponding implementations on InnoDB. Still, the overhead of
> pgSSI versus standard SI increases significantly with higher MPL
> than one would normally expect, reaching around 50% with MPL 128."
>

Actually, I implemented the benchmark described in the paper at first. I reported the result in a previous email.
The problem is that the transaction with longer conflict list is easier to be aborted, so the conflict list can not grow too long.
I modify the benchmark by using Update-only transaction and Read-only transaction to get rid of this problem. In this way, dangerous structure will never be generated.

> "Our profiling showed that PostgreSQL spend 2.3% of the overall
> runtime in traversing these list, plus 10% of its runtime waiting on
> the corresponding kernel mutexes."
>

Does "traversing these list" means the function "RWConflictExists"?
And "10% waiting on the corresponding kernel mutexes" means the lock in the function CheckForSerializableIn/out ?

> If you cannot duplicate their results, you might want to contact the
> authors for more details on their testing methodology.
>

If I used 30 cores for server, and 90 clients, RWConflictExists consumes 1.0% CPU cycles, and CheckForSerializableIn/out takes 5% in all.
But the total CPU utilization of PG is about 50%. So the result seems to be matched.
If we can solve this problem, we can use this benchmark for the future work.

Sincerely

--
Mengxing Liu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2017-06-03 07:32:59 Re: Why does logical replication launcher set application_name?
Previous Message Andrew Borodin 2017-06-03 06:26:06 Re: Range Merge Join v1