Re: PATCH: Using BRIN indexes for sorted output

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PATCH: Using BRIN indexes for sorted output
Date: 2022-11-16 23:52:35
Message-ID: 14568731-ac91-ce2d-1477-0e2504c72775@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/16/22 22:52, Greg Stark wrote:
> Fwiw tuplesort does do something like what you want for the top-k
> case. At least it used to last I looked -- not sure if it went out
> with the tapesort ...
> > For top-k it inserts new tuples into the heap data structure and then
> pops the top element out of the hash. That keeps a fixed number of
> elements in the heap. It's always inserting and removing at the same
> time. I don't think it would be very hard to add a tuplesort interface
> to access that behaviour.
>

Bounded sorts are still there, implemented using a heap (which is what
you're talking about, I think). I actually looked at it some time ago,
and it didn't look like a particularly good match for the general case
(without explicit LIMIT). Bounded sorts require specifying number of
tuples, and then discard the remaining tuples. But you don't know how
many tuples you'll actually find until the next minval - you have to
keep them all.

Maybe we could feed the tuples into a (sorted) heap incrementally, and
consume tuples until the next minval value. I'm not against exploring
that idea, but it certainly requires more work than just slapping some
interface to existing code.

> For something like BRIN you would sort the ranges by minvalue then
> insert all the tuples for each range. Before inserting tuples for a
> new range you would first pop out all the tuples that are < the
> minvalue for the new range.
>

Well, yeah. That's pretty much exactly what the last version of this
patch (from October 23) does.

> I'm not sure how you handle degenerate BRIN indexes that behave
> terribly. Like, if many BRIN ranges covered the entire key range.
> Perhaps there would be a clever way to spill the overflow and switch
> to quicksort for the spilled tuples without wasting lots of work
> already done and without being too inefficient.

In two ways:

1) Don't have such BRIN index - if it has many degraded ranges, it's
bound to perform poorly even for WHERE conditions. We've lived with this
until now, I don't think this makes the issue any worse.

2) Improving statistics for BRIN indexes - until now the BRIN costing is
very crude, we have almost no information about how wide the ranges are,
how much they overlap, etc. The 0001 part (discussed in a thread [1])
aims to provide much better statistics. Yes, the costing still doesn't
use that information very much.

regards

[1] https://commitfest.postgresql.org/40/3952/

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-11-17 00:22:27 Re: About displaying NestLoopParam
Previous Message Peter Geoghegan 2022-11-16 23:37:40 Re: Standardizing how pg_waldump presents recovery conflict XID cutoffs