Re: pgbench-ycsb

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: a(dot)bykov(at)postgrespro(dot)ru, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench-ycsb
Date: 2018-07-22 10:22:45
Message-ID: CA+q6zcXRqgTD-uG1snC4LiNAsBoujFHbnyZ3OL8q+1QPJso_mw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Sat, 21 Jul 2018 at 22:41, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>
> >> Could you provide a link to the specification?
> >>
> >> I cannot find something simple, and I was kind of hoping to avoid diving
> >> into the source code of the java tool on github:-) In particular, I'm
> >> looking for a description of the expected underlying schema and its size
> >> (scale) parameters.
> >
> > There are the description files for different workloads, like [1], (with the
> > custom amount of records, of course) and the schema [2]. Would this
> > information be enough?
> >
> > [1]: https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workloada
> > [2]: https://github.com/brianfrankcooper/YCSB/blob/master/jdbc/src/main/resources/sql/create_table.sql
>
> The second link is a start.
>
> I notice that the submitted patch transactions do not apply to this
> schema, which is significantly different from the pgbench TPC-B (like)
> benchmark.
>
> The YCSB schema is key -> fields[0-9], all of them TEXT, somehow expected
> to be 100 bytes each, and update is expected to update one of these
> fields.
>
> This suggest that maybe a -i extension would be in order. Possibly
>
> pgbench -i -s 1 --layout={tpcb,ycsb} (or schema ?)
>
> where "tpcb" would be the default?
>
> I'm sceptical about using a textual primary key as it corresponds more to
> NoSQL limitations than to an actual design choice. I'd be okay with INT8
> as a pkey.
>
> I find the YSCB tablename "usertable" especially unhelpful. Maybe
> "pgbench_ycsb"?

Just to clarify - if I understand Anthony correctly, this proposal is not about
implementing exactly YCSB as it is, but more about using zipfian distribution
for an id in the regular pgbench table structure in conjunction with read/write
balance to simulate something similar to it.

And probably instead of implementing the exact YCSB workload inside pgbench, it
makes more sense to add PostgreSQL Jsonb as one of the options into the
framework itself (I was in the middle of it few years ago, but then was
distracted by some interesting benchmarking results).

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrei Korigodski 2018-07-22 11:59:59 Re: pgbench: improve --help and --version parsing
Previous Message Noah Misch 2018-07-22 07:29:04 Re: Non-portable shell code in pg_upgrade tap tests