| From: | "Mengxing Liu" <liu-mx15(at)mails(dot)tsinghua(dot)edu(dot)cn> | 
|---|---|
| To: | "Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>, kgrittn <kgrittn(at)gmail(dot)com> | 
| Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions | 
| Date: | 2017-06-02 08:16:18 | 
| Message-ID: | 2697c71d.20f0a.15c67e064f8.Coremail.liu-mx15@mails.tsinghua.edu.cn | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi, Alvaro and Kevin.
> Anyway, this is just my analysis. 
> So I want to hack the PG and count the conflict lists' size of transactions. That would be more accurate.
In the last week, I hacked the PG to add an additional thread to count RWConflict list lengths. 
And tune the benchmark to get more conflict. But the result is still not good.
> 
> > 
> > Yeah, you need a workload that generates a longer conflict list -- if
> > you can make the tool generate a conflict list with a configurable
> > length, that's even better (say, 100 conflicts vs. 1000 conflicts).
> > Then we can see how the conflict list processing scales.
> > 
> 
> Yes, I tried to increase the read set to make more conflicts. 
> However the abort ratio will also increase. The CPU cycles consumed by conflict tracking are still less than 1%.
> According to the design of PG, a transaction will be aborted if there is a rw-antidependency. 
> In this case, a transaction with a longer conflict list, is more possible to be aborted.
> That means, the conflict list doesn't have too many chances to grow too long. 
> 
To solve this problem, I use just two kinds of transactions: Read-only transactions and Update-only transactions.
In this case, no transaction would  have an In-RWconflict and an Out-RWconflict at the same time.  
Thus transactions would not be aborted by conflict checking. 
Specifically, The benchmark is like this:
The table has 10K rows. 
Read-only transactions read 1K rows and Update-only transactions update 20 random rows of the table. 
In this benchmark, about 91% lists are shorter than 10; 
lengths of 6% conflict lists are between 10 and 20. Only 2% lists are longer than 20. The CPU utilization of CheckForSerializableConflictOut/In is 0.71%/0.69%.
I tried to increase the write set. As a result, conflict list become longer. But the total CPU utilization is decreased (about 50%).
CPU is not the bottleneck anymore. I'm not familiar with other part of PG. Is it caused by LOCK? Is there any chance to get rid of this problem?
BTW, I found that the email is not very convenient, especially when I  have some problem and want to discuss with you.
Would you mind scheduling a meeting every week by Skye, or other instant message software you like?
I would not take you too much time. Maybe one hour is enough.   
Sincerely.
--
Mengxing Liu
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Hao Lee | 2017-06-02 08:40:56 | Do we need the gcc feature "__builtin_expect" to promote the branches prediction? | 
| Previous Message | Pavel Stehule | 2017-06-02 08:15:34 | Re: proposal: PLpgSQL parallel assignemnt |