Re: [WIP] Zipfian distribution in pgbench

From: Alik Khilazhev <a(dot)khilazhev(at)postgrespro(dot)ru>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] Zipfian distribution in pgbench
Date: 2017-07-10 06:49:48
Message-ID: 46958A39-D273-456D-A2D6-E6655BA2B4DC@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Fabien!

> Your description is not very precise. What version of Postgres is used? If there is a decline, compared to which version? Is there a link to these results?

Benchmark have been done in master v10. I am attaching image with results:
.

> Indeed, the function computation is over expensive, and the numerical precision of the implementation is doubtful.
>
> If there is no better way to compute this function, ISTM that it should be summed in reverse order to accumulate small values first, from (1/n)^s + ... + (1/2)^ s. As 1/1 == 1, the corresponding term is 1, no point in calling pow for this one, so it could be:
>
> double ans = 0.0;
> for (i = n; i >= 2; i--)
> ans += pow(1. / i, theta);
> return 1.0 + ans;

You are right, it’s better to reverse order.

> If the functions when actually used is likely to be called with different parameters, then some caching beyond the last value would seem in order. Maybe a small fixed size array?
>
> However, it should be somehow thread safe, which does not seem to be the case with the current implementation. Maybe a per-thread cache? Or use a lock only to update a shared cache? At least it should avoid locking to read values…

Yea, I forget about thread-safety. I will implement per-thread cache with small fixed array.

> Given the explanations, the random draw mostly hits values at the beginning of the interval, so when the number of client goes higher one just get locking contention on the updated row?

Yes, exactly.

> ISTM that also having the tps achieved with a flat distribution would allow to check this hypothesis.

On Workload A with uniform distribution PostgreSQL shows better results than MongoDB and MySQL(see attachment). Also you can notice that for small number of clients type of distribution does not affect on tps on MySQL.

And it’s important to mention that postgres run with option synchronous_commit=off, to satisfy durability MongoDB writeConcern=1&journaled=false. In this mode there is possibility to lose all changes in the last second. If we run postgres with max durability MongoDB will lag far behind.
---
Thanks and Regards,
Alik Khilazhev
Postgres Professional:
http://www.postgrespro.com <http://www.postgrespro.com/>
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-07-10 07:04:36 Re: retry shm attach for windows (WAS: Re: OK, so culicidae is *still* broken)
Previous Message Noah Misch 2017-07-10 06:36:55 Re: retry shm attach for windows (WAS: Re: OK, so culicidae is *still* broken)