Re: pgbench - allow to create partitioned tables

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - allow to create partitioned tables
Date: 2019-09-11 12:38:13
Message-ID: alpine.DEB.2.21.1909100818130.1895@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Amit,

Thanks for the feedback.

> + case 11: /* partitions */
> + initialization_option_set = true;
> + partitions = atoi(optarg);
> + if (partitions < 0)
> + {
> + fprintf(stderr, "invalid number of partitions: \"%s\"\n",
> + optarg);
> + exit(1);
> + }
> + break;
>
> Is there a reason why we treat "partitions = 0" as a valid value?

Yes. It is an explicit "do not create partitioned tables", which differ
from 1 which says "create a partitionned table with just one partition".

> Also, shouldn't we keep some max limit for this parameter as well?

I do not think so. If someone wants to test how terrible it is to use
100000 partitions, we should not prevent it.

> Forex. how realistic it will be if the user gives the value of
> partitions the same or greater than the number of rows in
> pgbench_accounts table?

Although I agree that it does not make much sense, for testing purposes
why not, to test overheads in critical cases for instance.

> I understand it is not sensible to give such a value, but I guess the
> API should behave sanely in such cases as well.

Yep, it should work.

> I am not sure what will be the good max value for it, but I
> think there should be one.

I disagree. Pgbench is a tool for testing performance for given
parameters. If postgres accepts a parameter there is no reason why pgbench
should reject it.

> @@ -3625,6 +3644,7 @@ initCreateTables(PGconn *con)
> const char *bigcols; /* column decls if accountIDs are 64 bits */
> int declare_fillfactor;
> };
> +
> static const struct ddlinfo DDLs[] = {
>
> Spurious line change.

Indeed.

> *
> + " --partitions=NUM partition account table in NUM parts
> (defaults: 0)\n"
> + " --partition-method=(range|hash)\n"
> + " partition account table with this
> method (default: range)\n"
>
> Refer complete table name like pgbench_accounts instead of just account.
> It will be clear and in sync with what we display in some other options
> like --skip-some-updates.

Ok.

> *
> + " --partitions=NUM partition account table in NUM parts
> (defaults: 0)\n"
>
> /defaults/default.

Ok.

> I think we should print the information about partitions in
> printResults. It can help users while analyzing results.

Hmmm. Why not, with some hocus-pocus to get the information out of
pg_catalog, and trying to fail gracefully so that if pgbench is run
against a no partitioning-support version.

> *
> +enum { PART_NONE, PART_RANGE, PART_HASH }
> + partition_method = PART_NONE;
> +
>
> I think it is better to follow the style of QueryMode enum by using
> typedef here, that will make look code in sync with nearby code.

Hmmm. Why not. This means inventing a used-once type name for
partition_method. My great creativity lead to partition_method_t.

> *
> - int i;
>
> fprintf(stderr, "creating tables...\n");
>
> - for (i = 0; i < lengthof(DDLs); i++)
> + for (int i = 0; i < lengthof(DDLs); i++)
>
> This is unnecessary change as far as this patch is concerned. I
> understand there is no problem in writing either way, but let's not
> change the coding pattern here as part of this patch.

The reason I did that is that I had a stupid bug in a development version
which was due to an accidental reuse of this index, which would have been
prevented by this declaration style. Removed anyway.

> + if (partitions >= 1)
> + {
> + int64 part_size = (naccounts * (int64) scale + partitions - 1) / partitions;
> + char ff[64];
> + ff[0] = '\0';
> + append_fillfactor(ff, sizeof(ff));
> +
> + fprintf(stderr, "creating %d partitions...\n", partitions);
> +
> + for (int p = 1; p <= partitions; p++)
> + {
> + char query[256];
> +
> + if (partition_method == PART_RANGE)
> + {
>
> part_size can be defined inside "if (partition_method == PART_RANGE)"
> as it is used here.

I just wanted to avoid recomputing the value in the loop, but indeed it
may be computed needlessly. Moved.

> In general, this part of the code can use some comments.

Ok.

Attached an updated version.

--
Fabien.

Attachment Content-Type Size
pgbench-init-partitioned-5.patch text/x-diff 12.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera from 2ndQuadrant 2019-09-11 12:51:40 Re: logical decoding : exceeded maxAllocatedDescs for .spill files
Previous Message Anastasia Lubennikova 2019-09-11 12:37:59 Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.