Re: PATCH: Using BRIN indexes for sorted output

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, Zhihong Yu <zyu(at)yugabyte(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PATCH: Using BRIN indexes for sorted output
Date: 2023-02-23 15:22:37
Message-ID: 30f37aae-57fa-e7ab-9444-7b82969ed68f@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/23/23 15:19, Matthias van de Meent wrote:
> On Sat, 18 Feb 2023 at 16:55, Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> cfbot complained there's one more place triggering a compiler warning on
>> 32-bit systems, so here's a version fixing that.
>>
>> I've also added a copy of the regression tests but using the indexam
>> stats added in 0001. This is just a copy of the already existing
>> regression tests, just with enable_indexam_stats=true - this should
>> catch some of the issues that went mostly undetected in the earlier
>> patch versions.
>
>
> Comments on 0001, mostly comments and patch design:
>
>> +range_minval_cmp(const void *a, const void *b, void *arg)
>> [...]
>> +range_maxval_cmp(const void *a, const void *b, void *arg)
>> [...]
>> +range_values_cmp(const void *a, const void *b, void *arg)
>
> Can the arguments of these functions be modified into the types they
> are expected to receive? If not, could you add a comment on why that's
> not possible?
> I don't think it's good practise to "just" use void* for arguments to
> cast them to more concrete types in the next lines without an
> immediate explanation.
>

The reason is that that's what qsort() expects. If you change that to
actual data types, you'll get compile-time warnings. I agree this may
need better comments, though.

>> + * Statistics calculated by index AM (e.g. BRIN for ranges, etc.).
>
> Could you please expand on this? We do have GIST support for ranges, too.
>

Expand in what way? This is meant to be AM-specific, so if GiST wants to
collect some additional stats, it's free to do so - perhaps some of the
ideas from the stats collected for BRIN would be applicable, but it's
also bound to the index structure.

>> + * brin_minmax_stats
>> + * Calculate custom statistics for a BRIN minmax index.
>> + *
>> + * At the moment this calculates:
>> + *
>> + * - number of summarized/not-summarized and all/has nulls ranges
>
> I think statistics gathering of an index should be done at the AM
> level, not attribute level. The docs currently suggest that the user
> builds one BRIN index with 16 columns instead of 16 BRIN indexes with
> one column each, which would make the statistics gathering use 16x
> more IO if the scanned data cannot be reused.
>

Why? The row sample is collected only once and used for building all the
index AM stats - it doesn't really matter if we analyze 16 single-column
indexes or 1 index with 16 columns. Yes, we'll need to scan more
indexes, but the with 16 columns the summaries will be larger so the
total amount of I/O will be almost the same I think.

Or maybe I don't understand what I/O you're talking about?

> It is possible to build BRIN indexes on more than one column with more
> than one opclass family like `USING brin (id int8_minmax_ops, id
> int8_bloom_ops)`. This would mean various duplicate statistics fields,
> no?
> It seems to me that it's more useful to do the null- and n_summarized
> on the index level instead of duplicating that inside the opclass.

I don't think it's worth it. The amount of data this would save is tiny,
and it'd only apply to cases where the index includes the same attribute
multiple times, and that's pretty rare I think. I don't think it's worth
the extra complexity.

>
>> + for (heapBlk = 0; heapBlk < nblocks; heapBlk += pagesPerRange)
>
> I am not familiar with the frequency of max-sized relations, but this
> would go into an infinite loop for pagesPerRange values >1 for
> max-sized relations due to BlockNumber wraparound. I think there
> should be some additional overflow checks here.
>

Good point, but that's a pre-existing issue. We do this same loop in a
number of places.

>> +/*
>> + * get_attstaindexam
>> + *
>> + * Given the table and attribute number of a column, get the index AM
>> + * statistics. Return NULL if no data available.
>> + *
>
> Shouldn't this be "given the index and attribute number" instead of
> "the table and attribute number"?
> I think we need to be compatible with indexes on expression here, so
> that we don't fail to create (or use) statistics for an index `USING
> brin ( (int8range(my_min_column, my_max_column, '[]'))
> range_inclusion_ops)` when we implement stats for range_inclusion_ops.
>
>> + * Alternative to brin_minmax_match_tuples_to_ranges2, leveraging ordering
>
> Does this function still exist?
>

Yes, but only in 0003 - it's a "brute-force" algorithm used as a
cross-check the result of the optimized algorithm in 0001. You're right
it should not be referenced in the comment.

>
> I'm planning on reviewing the other patches, and noticed that a lot of
> the patches are marked WIP. Could you share a status on those, because
> currently that status is unknown: Are these patches you don't plan on
> including, or are these patches only (or mostly) included for
> debugging?
>

I think the WIP label is a bit misleading, I used it mostly to mark
patches that are not meant to be committed on their own. A quick overview:

0002-wip-introduce-debug_brin_stats-20230218-2.patch
0003-wip-introduce-debug_brin_cross_check-20230218-2.patch

Not meant for commit, used mostly during development/debugging, by
adding debug logging and/or cross-check to validate the results.

I think it's fine to ignore those during review.

0005-wip-brinsort-explain-stats-20230218-2.patch

This needs more work. It does what it's meant to do (show info about
the brinsort node), but I'm not very happy with the formatting etc.

Review and suggestions welcome.

0006-wip-multiple-watermark-steps-20230218-2.patch
0007-wip-adjust-watermark-step-20230218-2.patch
0008-wip-adaptive-watermark-step-20230218-2.patch

Ultimately this should be merged into 0004 (which does the actual
brinsort), I think 0006 is the way to go, but I kept all the patches
to show the incremental evolution (and allow comparisons).

0006 adds a GUC to specify how many ranges to add in each round,
0005 adjusts that based on statistics during planning and 0006 does
that adaptively during execution.

0009-wip-add-brinsort-regression-tests-20230218-2.patch
0010-wip-add-brinsort-amstats-regression-tests-20230218-2.patch

These need to move to the earlier parts. The tests are rather
expensive so we'll need to reduce it somehow.

0011-wip-test-generator-script-20230218-2.patch

Not intended for commit, I only included it to keep it as part of the
patch series.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2023-02-23 15:26:35 Re: Add LZ4 compression in pg_dump
Previous Message Tom Lane 2023-02-23 14:52:15 Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)