From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | John Naylor <johncnaylorls(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Batch TIDs lookup in ambulkdelete |
Date: | 2025-06-06 22:58:02 |
Message-ID: | CAD21AoCGzTu66ZjatOR2yxu6nVkpK7=XBh2reZXiJ819rT-38Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Jun 1, 2025 at 11:01 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Thu, May 1, 2025 at 5:36 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > HEAD PATCHED DIFF
> > case-1: 3,021 ms 2.818 ms 93.29%
> > case-2: 5, 697 ms 5.545 ms 97.34%
> > case-3: 2,833 ms 2.790 ms 98.48%
> > case-4: 2,564 ms 2.279 ms 88.86%
> > case-5: 4,657 ms 4.706 ms 101.04%
>
> 1 and 4 look significant -- do the other cases have reproducible
> differences or is it just noise?
These results are the average of 3 times executions, so they are
reproducible differences.
>
> > Here are the summary of each attached patch:
> >
> > 0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
> > for multiple TIDs in one function call. If the given TIDs are sorted
> > (at least in block number), we can save radix tree lookup for the same
> > page entry.
>
> My only comment is that TidStoreIsMember() is now unused in core (or
> maybe just the tests?). It seems like we could just change the API for
> it rather than introduce a new function?
Good point, changed in the latest patch I posted[1].
>
> > 0003: Use batch TIDs lookup in btree index bulk-deletion.
> >
> > In patch 0003, we implement batch TID lookups for both each
> > deduplicated index tuple and remaining all regular index tuples, which
> > seems to be the most straightforward approach.
>
> Seems like a good approach. btvacuumpage() needs to sort if there is a
> mix of posting tuples and regular index tuples. Was that covered by
> any of the tests above?
Good point, I think this case was not covered. I've measured the
performance with the following queries:
# Case-6:
create unlogged table test (c int) with (autovacuum_enabled = off);
insert into test select (2 * c - 1) from generate_series(1, ${NROWS}) c;
insert into test select c from generate_series(1, ${NROWS}) c;
create index on test (c);
select pg_relation_size('test') / 8192 as relpages \gset
delete from test where c < (${NROWS} * 0.3)::int
vacuum test;
And here is the result:
HEAD PATCHED DIFF
case-6: 3,320 ms 3,617 ms 108.94%
I'll consider how to deal with overheads of sorting TIDs.
> > While further
> > optimizations are possible, such as performing batch TID lookups for
> > all index tuples on a single page, these could introduce additional
> > overhead from sorting and re-sorting TIDs. Moreover, considering that
> > radix tree lookups are relatively inexpensive, the benefits of sorting
> > TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
> > these potential optimizations warrant further evaluation to determine
> > their actual impact on performance.
>
> My guess is that always sorting by TID and than back by index tuple
> offset is too much overhead to be worth it, but I'm not sure.
Agreed. Given the above test results, it's unlikely always sorting the
array helps speedups.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2025-06-06 23:27:46 | Re: Batch TIDs lookup in ambulkdelete |
Previous Message | Jeff Davis | 2025-06-06 22:47:16 | Re: CREATE DATABASE command for non-libc providers |