Quick Links

Re: Trying out read streams in pgvector (an extension)

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Melanie Plageman <melanieplageman(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Trying out read streams in pgvector (an extension)
Date:	2025-12-09 22:38:10
Message-ID:	CAH2-Wz=b4fLaR0Ljjcnp3gvMqtRifD7ArM8KZd4JMgjVv9mtdQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Dec 8, 2025 at 10:47 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Yielding just because you've scanned N index pages/tuples/whatever is
> harder to think about. The stream shouldn't get far ahead unless it's
> recently been useful for I/O concurrency (though optimal distance
> heuristics are an open problem), but in this case a single invocation
> of the block number callback can call ReadBuffer() an arbitrary number
> of times, filtering out all the index tuples as it rampages through
> the whole index IIUC. I see why you might want to yield periodically
> if you can, but I also wonder how much that can really help if you
> still have to pick up where you left off next time.

I think of it as a necessary precaution against pathological behavior
where the amount of memory used to cache matching tuples/TIDs gets out
of hand. There's no specific reason to expect that to happen (or no
good reason). But I'm pretty sure that it'll prove necessary to pay
non-zero attention to how much work has been done since the last time
we returned a tuple (when there's a tuple available to return).

> I guess it
> depends on the distribution of matches.

To be clear, I haven't done any kind of modelling of the problems in
this area. Once I do that (in 2026), I'll be able to say more about
the requirements. Maybe Tomas could take a look sooner?

Right now my focus is on getting the basic interfaces/API revisions in
better shape. And avoiding regressions while doing so.

--
Peter Geoghegan

In response to

Re: Trying out read streams in pgvector (an extension) at 2025-12-09 03:47:08 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2025-12-09 22:41:25	Re: Consistently use palloc_object() and palloc_array()
Previous Message	Thomas Munro	2025-12-09 22:38:06	Re: Solaris versus our NLS files