Quick Links

Re: Hash Indexes

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Hash Indexes
Date:	2016-09-15 05:48:57
Message-ID:	CAA4eK1KMS3WP0EG13xhyrB_30qKj53vtA2VHty7qOyaAaV6bxQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Sep 15, 2016 at 4:44 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Tue, May 10, 2016 at 5:09 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>>
>>
>>
>> Although, I don't think it is a very good idea to take any performance
>> data with WIP patch, still I couldn't resist myself from doing so and below
>> are the performance numbers. To get the performance data, I have dropped
>> the primary key constraint on pgbench_accounts and created a hash index on
>> aid column as below.
>>
>> alter table pgbench_accounts drop constraint pgbench_accounts_pkey;
>> create index pgbench_accounts_pkey on pgbench_accounts using hash(aid);
>
>
>
> To be rigorously fair, you should probably replace the btree primary key
> with a non-unique btree index and use that in the btree comparison case. I
> don't know how much difference that would make, probably none at all for a
> read-only case.
>
>>
>>
>>
>> Below data is for read-only pgbench test and is a median of 3 5-min runs.
>> The performance tests are executed on a power-8 m/c.
>
>
> With pgbench -S where everything fits in shared_buffers and the number of
> cores I have at my disposal, I am mostly benchmarking interprocess
> communication between pgbench and the backend. I am impressed that you can
> detect any difference at all.
>
> For this type of thing, I like to create a server side function for use in
> benchmarking:
>
> create or replace function pgbench_query(scale integer,size integer)
> RETURNS integer AS $$
> DECLARE sum integer default 0;
> amount integer;
> account_id integer;
> BEGIN FOR i IN 1..size LOOP
> account_id=1+floor(random()*scale);
> SELECT abalance into strict amount FROM pgbench_accounts
> WHERE aid = account_id;
> sum := sum + amount;
> END LOOP;
> return sum;
> END $$ LANGUAGE plpgsql;
>
> And then run using a command like this:
>
> pgbench -f <(echo 'select pgbench_query(40,1000)') -c$j -j$j -T 300
>
> Where the first argument ('40', here) must be manually set to the same value
> as the scale-factor.
>
> With 8 cores and 8 clients, the values I get are, for btree, hash-head,
> hash-concurrent, hash-concurrent-cache, respectively:
>
> 598.2
> 577.4
> 668.7
> 664.6
>
> (each transaction involves 1000 select statements)
>
> So I do see that the concurrency patch is quite an improvement. The cache
> patch does not produce a further improvement, which was somewhat surprising
> to me (I thought that that patch would really shine in a read-write
> workload, but I expected at least improvement in read only)
>

To see the benefit from cache meta page patch, you might want to test
with clients more than the number of cores, atleast that is what data
by Mithun [1] indicates or probably in somewhat larger m/c.

[1] - https://www.postgresql.org/message-id/CAD__OugX0aOa7qopz3d-nbBAoVmvSmdFJOX4mv5tFRpijqH47A%40mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Re: Hash Indexes at 2016-09-14 23:14:08 from Jeff Janes

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ashutosh Bapat	2016-09-15 05:58:05	Re: Printing bitmap objects in the debugger
Previous Message	Amit Kapila	2016-09-15 05:41:41	Re: Hash Indexes