Re: [WIP] Zipfian distribution in pgbench

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Alik Khilazhev <a(dot)khilazhev(at)postgrespro(dot)ru>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] Zipfian distribution in pgbench
Date: 2017-08-05 01:02:02
Message-ID: CAH2-WznyizQ1B3g0L8CyN7r=GmO_7Ft=SUYWUYCoYyesJ6v_mQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 21, 2017 at 4:51 AM, Alik Khilazhev
<a(dot)khilazhev(at)postgrespro(dot)ru> wrote:
> (Latest version of pgbench Zipfian patch)

While I'm +1 on this idea, I think that it would also be nice if there
was an option to make functions like random_zipfian() actually return
a value that has undergone perfect hashing. When this option is used,
any given value that the function returns would actually be taken from
a random mapping to some other value in the same range. So, you can
potentially get a Zipfian distribution without the locality.

While I think that Zipfian distributions are common, it's probably
less common for the popular items to be clustered together within the
primary key, unless the PK has multiple attributes, and the first is
the "popular attribute". For example, there would definitely be a lot
of interest among contiguous items within an index on "(artist,
album)" where "artist" is a popular artist, which is certainly quite
possible. But, if "album" is the primary key, and it's a SERIAL PK,
then there will only be weak temporal locality within the PK, and only
because today's fads are more popular than the fads from previous
years.

Just another idea, that doesn't have to hold this patch up.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-08-05 01:29:31 Re: Subscription code improvements
Previous Message Tom Lane 2017-08-04 22:41:12 Draft release notes up for review