Re: pgbench: Skipping the creating primary keys after initialization

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgbench: Skipping the creating primary keys after initialization
Date: 2017-08-03 09:39:52
Message-ID: alpine.DEB.2.20.1708031038020.19721@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello,

> My motivation of this proposal is same as what Robert has. I
> understand that ad-hoc option can solve only the part of big problem
> and it could be cause of mess. However It seems me that the script
> especially for table initialization will not be flexible than we
> expected. I mean, even if we provide some meta commands for table
> initialization or data loading, these meta commands work for only
> pgbench tables (i.g., pgbench_accounts, pgbench_branches and so on).
> If we want to create other tables and load data to them as we want we
> can do that using psql -f. So an alternative ways is having a flexible
> style option for example --custom-initialize = { [load, create_pkey,
> create_fkey, vacuum], ... }. That would solve this in a better way.

Personnaly, I could be fine with a limited number of long options to
adjust pgbench initialization to various needs, eg --use-hash-index,
--skip-whetever-index, etc.

The flexible --custom-init idea outlined above looks nice as well.

As for a more generic solution, the easy part are the "CREATE" stuff and
the transaction script stuff (existing pgbench scripts).

For the CREATE stuff, the script language is SQL, the command to use it is
"psql"...

The real and hard part is to fill tables with meaningful pseudo-random
test data which do not violate constraints for any non trivial schema
involving foreign keys and various unique constraints.

The solution for this is SQL for trivial cases, think of:

"INSERT INTO Foo() SELECT ... FROM generate_series(...);"

For instance the pgbench initialization is really close to:

psql -Dscale=10 <<EOF
CREATE TABLE ... ;
INSERT INTO pgbench_account(...)
SELECT ... FROM generate_series(1, 100000 * :scale) AS i;
INSERT ...
CREATE INDEX ...;
VACUUM FULL ANALYZE;
EOF

And all existing options could probably be implemented easilly with the
recently added conditional (\if).

So my 0.02€ is that if something is to be done, I would suggest to turn
the creation and initialization stuff into a standard "psql" script that
could be called from pgbench instead of integrating much more ad-hoc stuff
into pgbench.

Note that non trivial schema initialization requires more general
programming, so I do not believe in doing a lot at pgbench or psql levels.
The best I could come with is a data generator which takes as input the
schema with added directives on how to generate the various attributes
(tool named "datafiller", that some people use:-).

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2017-08-03 09:55:21 Re: Update comments in nodeModifyTable.c
Previous Message Michael Paquier 2017-08-03 09:37:32 Re: Clarification in pg10's pgupgrade.html step 10 (upgrading standby servers)