Re: Next Steps with Hash Indexes

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Next Steps with Hash Indexes
Date: 2021-10-27 12:16:54
Message-ID: CAA4eK1JWMj61LyfBrbrcE1yce02xV7=V8ysQS1qEQXAoRE4VWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 27, 2021 at 4:55 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
>
> On Wed, 27 Oct 2021 at 12:58, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Oct 27, 2021 at 2:32 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Oct 5, 2021 at 6:50 AM Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com> wrote:
> > > > With unique data, starting at 1 and monotonically ascending, hash
> > > > indexes will grow very nicely from 0 to 10E7 rows without causing >1
> > > > overflow block to be allocated for any bucket. This keeps the search
> > > > time for such data to just 2 blocks (bucket plus, if present, 1
> > > > overflow block). The small number of overflow blocks is because of the
> > > > regular and smooth way that splits occur, which works very nicely
> > > > without significant extra latency.
> > >
> > > It is my impression that with non-unique data things degrade rather
> > > badly.
> > >
> >
> > But we will hold the bucket lock only for unique-index in which case
> > there shouldn't be non-unique data in the index.
>
> Even in unique indexes there might be many duplicate index entries: A
> frequently updated row, to which HOT cannot apply, whose row versions
> are waiting for vacuum (which is waiting for that one long-running
> transaction to commit) will have many entries in each index.
>
> Sure, it generally won't hit 10E7 duplicates, but we can hit large
> numbers of duplicates fast on a frequently updated row. Updating one
> row 1000 times between two runs of VACUUM is not at all impossible,
> and although I don't think it happens all the time, I do think it can
> happen often enough on e.g. an HTAP system to make it a noteworthy
> test case.
>

I think it makes to test such cases and see the behavior w.r.t overflow buckets.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2021-10-27 13:07:45 Re: GSoC 2021 Proposal Document
Previous Message Matthias van de Meent 2021-10-27 11:25:15 Re: Next Steps with Hash Indexes