Re: rewrite HeapSatisfiesHOTAndKey

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>
Subject: Re: rewrite HeapSatisfiesHOTAndKey
Date: 2017-01-04 18:15:31
Message-ID: CABOikdMUQQs4BnJ4Ws-ObOEDh8vhNp13Y1caK_i8seSHKPjbhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 3, 2017 at 9:33 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Jan 2, 2017 at 1:36 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > Okay, but I think if we know how much is the additional cost in
> > average and worst case, then we can take a better call.
>
> Yeah. We shouldn't just rip out optimizations that are inconvenient
> without doing some test of what the impact is on the cases where those
> optimizations are likely to matter. I don't think it needs to be
> anything incredibly laborious and if there's no discernable impact,
> great.

So I performed some tests to measure if this causes any noticeable
regression. I used the following simple schema:

DROP TABLE IF EXISTS testtab;
CREATE UNLOGGED TABLE testtab (
col1 integer,
col2 text,
col3 float,
col4 text,
col5 text,
col6 char(30),
col7 text,
col8 date,
col9 text,
col10 text
);
INSERT INTO testtab
SELECT generate_series(1,100000),
md5(random()::text),
random(),
md5(random()::text),
md5(random()::text),
md5(random()::text)::char(30),
md5(random()::text),
now(),
md5(random()::text),
md5(random()::text);
CREATE INDEX testindx ON testtab (col1, col2, col3, col4, col5, col6, col7,
col8, col9);

I used a rather wide UNLOGGED table with an index on first 9 columns, as
suggested by Amit. Also, the table has reasonable number of rows, but not
more than what shared buffers (set to 512MB for these tests) can hold. This
should make the test mostly CPU bound.

A transaction then updates the second column in the table. So the
refactored patch will do heap_getattr() on more columns that the master
while checking if HOT update is possible and before giving up. I believe we
are probably testing a somewhat worst case with this setup, though may be I
could have tuned some other configuration parameters.

\set value random(1, 100000)
UPDATE testtab SET col2 = md5(random()::text) WHERE col1 = :value;

I tested with -c1 and -c8 -j4 and the results are:

1-client
Master Refactored
Run1 8774.089935 8979.068604
Run2 8509.2661 8943.613575
Run3 8879.484019 8950.994425

8-clients
Master Refactored
Run1 22520.422448 22672.798871
Run2 21967.812303 22022.969747
Run3 22305.073223 21909.945623

So at best there is some improvement with the patch, though I don't see any
reason why it should positively affect the performance. The results with
more number of clients look almost identical, probably because the
bottleneck shifts somewhere else. For all these tests, table was dropped
and recreated in every iteration, so I don't think there was any error in
testing. It might be a good idea for someone else to repeat the tests to
confirm the improvement that I noticed.

Apart from this, I also ran "make check" multiple times and couldn't find
any significant difference in the average time.

I will leave it to Alvaro's judgement to decide whether it's worth to
commit the patch now or later when he or other committer looks at
committing WARM/indirect indexes because without either of those patches
this change probably does not bring up much value, if we ignore the slight
improvement we see here.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-01-04 18:18:40 Re: Declarative partitioning - another take
Previous Message Pavel Stehule 2017-01-04 17:58:08 Re: proposal: session server side variables