Re: amcheck support for BRIN indexes

From: Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com>
To: Álvaro Herrera <alvherre(at)kurilemu(dot)de>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>
Subject: Re: amcheck support for BRIN indexes
Date: 2025-08-12 21:19:58
Message-ID: CAE7r3MKUOGJ0v5-b5fYaF6sxKZvr0J-YXHTJf8u8GUr1tTcvNg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thank you for looking into it!

On Tue, Aug 5, 2025 at 2:21 PM Álvaro Herrera <alvherre(at)kurilemu(dot)de> wrote:
>
> On 2025-Jul-08, Tomas Vondra wrote:
>
> > On 7/8/25 14:40, Arseniy Mukhin wrote:
>
> > > Thank you for the feedback! I agree with the benefits. Speaking of
> > > (с), it seems most of the time to be really trivial to build such a
> > > ScanKey, but not every opclass supports '=' operator. amcheck should
> > > handle these cases somehow then. I see two options here. The first is
> > > to not provide 'heap all indexed' check for such opclasses, which is
> > > sad because even one core opclass (box_inclusion_ops) doesn't support
> > > '=' operator, postgis brin opclasses don't support it too AFAICS. The
> > > second option is to let the user define which operator to use during
> > > the check, which, I think, makes user experience much worse in this
> > > case. So both options look not good from the user POV as for me, so I
> > > don't know. What do you think about it?
> > >
> > > And should I revert the patchset to the consistent function version then?
> >
> > Yeah, that's a good point. The various opclasses may support different
> > operators, and we don't know which "strategy" to fill into the scan key.
> > Minmax needs BTEqualStrategyNumber, inclusion RTContainsStrategyNumber,
> > and so on.
>
> Hmm, maybe we can make the operator argument to the function an optional
> argument. Then, if it's not given, use equality for the cases where
> that works; if equality doesn't work for the column in that opclass,
> throw an error to request an operator. That way we support the most
> common case in the easy way, and for the other cases the user has to
> work a little harder -- but I think it's not too bad.

Yes, the operator list is an optional argument now. Like you said, if
it's not passed to the function call, the equality operator is used.

I realized that solving the problem with opclasses without equality
operator by letting user to define operator list has several
drawbacks:

It's not very convenient to call automatically? Because the calls are
different from index to index. You can't just call
brin_index_check('index', true, true) on everything. Maybe I'm wrong,
but it seems like amcheck is a tool that is often used to periodically
check the health of Postgres clusters (and there can be many of them),
so users probably don't want to get into the details of each index.

Also, it seems like we don't want the user to define the operator to
check. We want them to pass in the "correct" operator if there is no
equality operator. So there's no choice, we just want users to figure
out what the correct operator is and pass it in. But we already know
what the "correct" operator is. Maybe we should just implement an
opclass <-> "correct" operator mapping on the database side? We also
need opclass developers to be able to add such a mapping if they want
their opclass to be supported by amcheck. Then during the check we can
look up into the mapping and use the operators. I was thinking about a
new catalog table or maybe adding it to BrinOpcInfo? Probably there is
a better way to do it? If the mapping doesn't have an operator for
opclass - no problem, we can skip the consistentFn call for such
columns and maybe log a message about it. This way we don't have all
these problems with operator list argument and with false positives
when a user fails to realize what the "correct" operator is.

> I think you should have tests with indexes on more than one column.

There is one multicolumn index test, I added another one where the
list of operators is passed. Also added tests with invalid operator
list.

>
> This syntax looks terrible
> SELECT brin_index_check('brintest_idx'::REGCLASS, true, true, '{"@>"}');
>
> the thingy at the end looks like '90s modem line noise. Can we maybe
> use a variadic argument, so that if you have multiple indexed columns
> you specify the operators in separate args somehow and avoid the quoting
> and array decoration? I imagine something like
>
> SELECT brin_index_check('brintest_idx'::REGCLASS, true, true, '@>', '=');
> or whatever. (To think about: if I want '=' to be omitted, but I have
> the second column using a type that doesn't support the '=', what's a
> good syntax to use?)
>

Good idea! It looks better. I changed the operator list to variadic
argument. For now brin_index_check() expects the user to define all or
nothing. I think if we want to allow the user to omit operators for
some columns, then we need to know the indexes of the columns for
which the passed operators are intended. Something like this maybe:

brin_index_check('brintest_idx', true, true, 1, '@>', 3, '@>');

>
> Regarding 0003: I think the new function should be just
> CheckIndexCheckXMin(Relation idxrel, Snapshot snap)
> and make the caller responsible for the snapshot handling. Otherwise
> you end up in the weird situation in 0004 where you have to do
> UnregisterSnapshot(RegisterSnapshotAndDoStuff())
> instead of the more ordinary
> RegisterSnapshot()
> CheckIndexCheckXMin()
> UnregisterSnapshot()
>

Done. BTW gist amcheck also needs 0003 [0]. Maybe we can move it to
the separate thread and commit, so both patches can use it? What do
you think, Andrey?

>
> You need an amcheck.sgml update for the new function.

Done. The operator list is the most controversial part. It was not
easy to figure out how to describe the criteria of the "correct"
operator, I hope it's not very confusing.

And I have a question, is it somehow helpful that 'index structure
check' and 'heap all indexed' are in the different patches? It makes
it a bit more difficult to update the patchset, so if it's not useful
for reviewers I would probably merge it in the next version.

So here is a new version.

Thank you!

[0] https://www.postgresql.org/message-id/flat/5FC1B5B6-FB35-44A2-AB62-632F14E958C5(at)yandex-team(dot)ru

Best regards,
Arseniy Mukhin

Attachment Content-Type Size
v10-0001-brin-refactoring.patch text/x-patch 4.1 KB
v10-0004-amcheck-brin_index_check-heap-all-indexed.patch text/x-patch 36.0 KB
v10-0003-amcheck-common_verify-snapshot-indcheckxmin-chec.patch text/x-patch 3.9 KB
v10-0002-amcheck-brin_index_check-index-structure-check.patch text/x-patch 48.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2025-08-12 21:22:20 Re: index prefetching
Previous Message Sami Imseih 2025-08-12 21:16:48 Re: Improve LWLock tranche name visibility across backends