Re: pgbench - allow to create partitioned tables

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Asif Rehman <asifr(dot)rehman(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench - allow to create partitioned tables
Date: 2019-09-17 13:33:34
Message-ID: alpine.DEB.2.21.1909171507570.16377@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Amit,

>>> Why can't we change it as attached?
>>
>> I think that your version works, but I do not like much the condition for
>> the normal case which is implicitely assumed. The solution I took has 3
>> clear-cut cases: 1 error against a server without partition support,
>> detect multiple pgbench_accounts table -- argh, and then the normal
>> expected case, whether partitioned or not. Your solution has 4 cases
>> because of the last implicit zero-row select that relies on default, which
>> would need some explanations.
>
> Why?

Hmmm. This is a coding-philosophy question:-)

To be nice to the code reader?

You have several if cases, but the last one is to keep the default *which
means something*. ISTM that the default is kept in two cases: when there
is a pgbench_accounts without partitioning, and when no pgbench_accounts
was found, in which case the defaults are plain false. I could be okay of
the default say "we do not know", but for me having all cases explicitely
covered in one place helps understand the behavior of a code.

> Here, we are fetching the partitioning information. If it exists, then
> we remember that to display for later, otherwise, the default should
> apply.

Yep, but the default is also kept if nothing is found, whereas the left
join solution would give one row when found and empty when not found,
which for me are quite distinct cases.

> Oh no, I am not generally against using left join, but here it appears
> like using it without much need. If nothing else, it will consume
> more cycles to fetch one extra row when we can avoid it.

As pointed out, the left join allows to distinguish "not found" from "not
partitioned" logically, even if no explicit use of that is done
afterwards.

> Irrespective of whether we use left join or not, I think the below
> change from my patch is important.
> - /* only print partitioning information if some partitioning was detected */
> - if (partition_method != PART_NONE && partition_method != PART_UNKNOWN)
> + /* print partitioning information only if there exists any partition */
> + if (partitions > 0)
>
> Basically, it would be good if we just rely on 'partitions' to decide
> whether we have partitions or not.

Could be, although I was thinking of telling the user that we do not know
on unknown. I'll think about this one.

>> In the case at hand, I find that getting one row in all cases pretty
>> elegant because there is just one code for handling them all.
>
> Hmm, I would be fine if you can show some other place in code where
> such a method is used

No problem:-) Although there are no other catalog queries in "pgbench",
there are plenty in "psql" and "pg_dump", and also in some other commands,
and they often rely on "LEFT" joins:

sh> grep LEFT src/bin/psql/*.c | wc -l # 58
sh> grep LEFT src/bin/pg_dump/*.c | wc -l # 54

Note that there are no "RIGHT" nor "FULL" joins…

>>> What is the need of using regress_pgbench_tap_1_ts in this test?
>>
>> I wanted to check that tablespace options work appropriately with
>> partition tables, as I changed the create table stuff significantly, and
>> just using "pg_default" is kind of cheating.
>
> I think your change will be tested if there is a '--tablespace'
> option.

Yes. There is just one, really.

> Even if you want to test win non-default tablespace, then also, adding
> additional test would make more sense rather than changing existing one
> which is testing a valid thing.

Tom tends to think that there are already too many tests, so I try to keep
them as compact/combined as possible. Moreover, the spirit of this test is
to cover "all possible options", so it made also sense to add the new
options there, and it achieves both coverage and testing my changes with
an explicit tablespace.

> Also, there is an existing way to create tablespace location in
> "src/bin/pg_checksums/t/002_actions". I think we can use the same. I
> don't find any problem with your way, but why having multiple ways of
> doing same thing in code. We need to test this on windows also once as
> this involves some path creation which might vary, although I don't
> think there should be any problem in that especially if we use existing
> way.

Ok, I'll look at the pg_checksums way to do that.

>>> - 'pgbench scale 1 initialization');
>>> + 'pgbench scale 1 initialization with options');
>>>
>>> Similar to the above, it is not clear to me why we need to change this?
>>
>> Because I noticed that it had the same description as the previous one,
>> so I made the test name distinct and more precise, while I was adding
>> options on it.

Hmmm. Keeping the same name is really a copy paste error, and I wanted to
avoid a distinct commit for more than very minor thing.

> Good observation, but better be done separately. I think in general
> the more unrelated changes are present in patch, the more time it
> takes to review.

Then let's keep the same name.

> One more comment:
> +typedef enum { PART_NONE, PART_RANGE, PART_HASH, PART_UNKNOWN }
> + partition_method_t;
>
> See, if we can eliminate PART_UNKNOWN. I don't see much use of same.
> It is used at one place where we can set PART_NONE without much loss.
> Having lesser invalid values makes code easier to follow.

Discussed in other mail.

--
Fabien.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-09-17 13:34:20 Re: [proposal] de-TOAST'ing using a iterator
Previous Message Mahendra Singh 2019-09-17 13:15:06 Re: range test for hash index?