Re: optimize lookups in snapshot [sub]xip arrays

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: optimize lookups in snapshot [sub]xip arrays
Date: 2022-07-14 18:09:38
Message-ID: 20220714180938.GA3125784@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Bharath,

Thanks for taking a look.

On Thu, Jul 14, 2022 at 03:10:56PM +0530, Bharath Rupireddy wrote:
> Aren't these snapshot arrays always sorted? I see the following code:
>
> /* sort so we can bsearch() */
> qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator);
>
> /* sort so we can bsearch() later */
> qsort(snap->subxip, snap->subxcnt, sizeof(TransactionId), xidComparator);

AFAICT these arrays are sorted in limited cases, such as
pg_current_snapshot() and logical replication. GetSnapshotData() does not
appear to sort them, so I don't think we can always assume they are sorted.
In the previous thread, Tomas analyzed simply sorting the arrays [0] and
found that it provided much less improvement compared to the hash table
approach, so I have not seriously considered it here.

> If the ordering isn't an invariant of these snapshot arrays, can we
> also use the hash table mechanism for all of the snapshot arrays
> infrastructure rather than qsort+bsearch in a few places and hash
> table for others?

Unless there is demonstrable benefit in doing so for the few places that
sort the arrays, I'm ѕkeptical it's worth the complexity. This patch is
targeted to XidInMVCCSnapshot(), which we can demonstrate has clear impact
on TPS for some workloads.

> + * The current value worked well in testing, but it's still mostly a guessed-at
> + * number that might need updating in the future.
> + */
> +#define XIP_HASH_MIN_ELEMENTS (128)
> +
>
> Do you see a regression with a hash table for all the cases? Why can't
> we just build a hash table irrespective of these limits and use it for
> all the purposes instead of making it complex with different
> approaches if we don't have measurable differences in the performance
> or throughput?

I performed the same tests as before with a variety of values. Here are
the results:

writers HEAD 1 16 32 64 128
------------------------------------
16 659 698 678 659 665 664
32 645 661 688 657 649 663
64 659 656 653 649 663 692
128 641 636 639 679 643 716
256 619 641 619 643 653 610
512 530 609 582 602 605 702
768 469 610 608 551 571 582
1000 367 610 538 557 556 577

I was surpised to see that there really wasn't a regression at the low end,
but keep in mind that this is a rather large machine and a specialized
workload for generating snapshots with long [sub]xip arrays. That being
said, there really wasn't any improvement at the low end, either. If we
always built a hash table, we'd be introducing more overhead and memory
usage in return for approximately zero benefit. My intent was to only take
on the overhead in cases where we believe it might have a positive impact,
which is why I picked the somewhat conservative value of 128. If the
overhead isn't a concern, it might be feasible to always make [sub]xip a
hash table.

> +static inline bool
> +XidInXip(TransactionId xid, TransactionId *xip, uint32 xcnt,
> + xip_hash_hash **xiph)
>
> + /* Make sure the hash table is built. */
> + if (*xiph == NULL)
> + {
> + *xiph = xip_hash_create(TopTransactionContext, xcnt, NULL);
> +
> + for (int i = 0; i < xcnt; i++)
>
> Why create a hash table on the first search? Why can't it be built
> while inserting or creating these snapshots? Basically, instead of the
> array, can these snapshot structures be hash tables by themselves? I
> know this requires a good amount of code refactoring, but worth
> considering IMO as it removes bsearch thus might improve the
> performance further.

The idea is to avoid the overhead unless something actually needs to
inspect these arrays.

[0] https://postgr.es/m/057a9a95-19d2-05f0-17e2-f46ff20e9b3e%402ndquadrant.com

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-07-14 18:34:07 Re: Pre-allocating WAL files
Previous Message Masahiko Sawada 2022-07-14 17:36:39 Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns