From: | Alik Khilazhev <a(dot)khilazhev(at)postgrespro(dot)ru> |
---|---|

To: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |

Cc: | PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org> |

Subject: | Re: [WIP] Zipfian distribution in pgbench |

Date: | 2017-07-14 13:04:28 |

Message-ID: | 9C096568-3DAE-4CE7-9376-19F3DAEE2EB1@postgrespro.ru |

Views: | Raw Message | Whole Thread | Download mbox | Resend email |

Thread: | |

Lists: | pgsql-hackers |

> On 13 Jul 2017, at 19:14, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:

>

> Documentation says that the closer theta is from 0 the flatter the distribution

> but the implementation requires at least 1, including strange error messages:

>

> zipfian parameter must be greater than 1.000000 (not 1.000000)

>

> Could theta be allowed between 0 & 1 ? I've tried forcing with theta = 0.1

> and it worked well, so I'm not sure that I understand the restriction.

> I also tried with theta=0.001 but it seemed less good.

Algorithm works with theta less than 1. The only problem here is that theta can not be 1, because of next line of code

cell->alpha = 1. / (1 - theta);

That’s why I put such restriction. Now I see 2 possible solutions for that:

1) Exclude 1, and allow everything in range (0;+∞).

2) Or just increase/decrease theta by very small number if it is 1.

> I have also tried to check the distribution wrt the explanations, with the attached scripts, n=100, theta=1.000001/1.5/3.0: It does not seem to work, there is repeatable +15% bias on i=3 and repeatable -3% to -30% bias for values in i=10-100, this for different values of theta (1.000001,1.5, 3.0).

>

> If you try the script, beware to set parameters (theta, n) consistently.

I've executed scripts that you attached with different theta and number of outcomes(not n, n remains the same = 100) and I found out that for theta = 0.1 and big number of outcomes it gives distribution very similar to zipfian(for number of outcomes = 100 000, bias -6% to 8% in whole range and for NOO = 1000 000, bias is -2% to 2%).

By, number of outcomes(NOO) I mean how many times random_zipfian was called. For example:

pgbench -f compte_bench.sql -t 100000

So, I think it works but works worse for small number of outcomes. And also we need to find optimal theta for better results.

—

Thanks and Regards,

Alik Khilazhev

Postgres Professional:

http://www.postgrespro.com <http://www.postgrespro.com/>

The Russian Postgres Company

- Re: [WIP] Zipfian distribution in pgbench at 2017-07-13 16:14:23 from Fabien COELHO

- Re: [WIP] Zipfian distribution in pgbench at 2017-07-14 14:51:51 from Fabien COELHO

From | Date | Subject | |
---|---|---|---|

Next Message | Heikki Linnakangas | 2017-07-14 13:07:46 | Re: BUG #14634: On Windows pg_basebackup should write tar to stdout in binary mode |

Previous Message | Michael Paquier | 2017-07-14 12:32:04 | Re: hash index on unlogged tables doesn't behave as expected |