Re: [PATCHES] Including Snapshot Info with Indexes

From: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Including Snapshot Info with Indexes
Date: 2007-10-23 11:35:20
Message-ID: 9362e74e0710230435r4553454eg4258e0fea2a03198@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On 10/23/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>
> Please keep the list cc'd.
>
> Gokulakannan Somasundaram wrote:
> > On 10/23/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
> >> Gokulakannan Somasundaram wrote:
> >> I have also enabled the display of Logical Reads. In order to see that,
> >> set
> >>> log_statement_stats on.
> >> You should start benchmarking, to verify that you're really getting the
> >> kind of speed up you're looking for, before you spend any more effort
> on
> >> that. Reduction in logical reads alone isn't enough. Remember that for
> a
> >> big change like that, the gain has to be big as well.
> >
> > I have done the benchmark. I have done the benchmark with Logical reads,
> as
> > they turn out to be potential physical reads. Try turning on the
> > log_statement_stats in postgresql.conf. try firing some queries, which
> can
> > satisfied by the index. You would see the difference.
>
> I would see a decrease in the number of logical reads, that's all. You
> need to demonstrate a real increase in throughput and/or reduction in
> response times.
>
> Note that even though you reduce the number of logical reads, with a
> thick index a logical read is *more* likely to be a physical read,
> because the index is larger and therefore consumes more cache.

Say, with a normal index, you need to goto the table for checking the
snapshot. So you would be loading both the index pages + table pages, in
order to satisfy a certain operations. Whereas in thick index you occupy 16
bytes per tuple more in order to avoid going to the table. So memory
management is again better. But i can run the load test, if that's
required. Even when all the tuples are in memory, index only scans are
almost 40-60% faster than the index scans with thin indexes.

> As a first test, I'd like to see results from SELECTs on different sized
> >> tables. On tables that fit in cache, and on tables that don't. Tables
> >> large enough that the index doesn't fit in cache. And as a special
> case,
> >> on a table just the right size that a normal index fits in cache, but a
> >> thick one doesn't.
> >
> > I have not done a Load test. That's a good idea. Are you guys using
> Apache
> > JMeter?
>
> You can use whatever you want, as long as you can get the relevant
> numbers out of it. contrib/pgbench is a good place to start.
>
> DBT-2 is another test people often use for patches like this. It's quite
> tedious to set up and operate, but it'll give you nice very graphs.
>
> Make sure you control vacuums, checkpoints etc., so that you get
> repeatable results.

Sure i will do that. Thanks for the advice.

> Also i think you might have noted that the thick indexes are not affected
> by
> > updates, if the updated column is not in the index. I think that add on
> to
> > one more advantage of thick indexes against DSM.
>
> That cannot possibly work. Imagine that you have a table
>
> ctid | id | data
> -----+----+-----
> (0,1)| 1 | foo
> (0,2)| 1 | bar
>
> where (0,2) is an updated version of (0,1). If you don't update the
> index, there will be no index pointer to (0,2), so a regular index scan,
> not an index-only scan, will not find the updated tuple.
>
> Or did you mean that the index is not updated on HOT updates? That's an
> interesting observation. We could do index-only scans with the DSM as
> well, even if there's HOT updates, if we define the bit in the bitmap to
> mean "all tuples in this page are visible to everyone, or there's only
> HOT updates". That works, because an index-only-scan doesn't access any
> of the updated columns. It probably isn't worth it, though. Seems like a
> pretty narrow use case, and makes it more complicated.

I think i was not understood. An update transaction is not degraded by thick
index. Update = Delete + insert. If you don't update the columns in index,
then we would goto the same index page for both delete and insert. i have
done a small optimization there to cache the BTStack. you do not need to do
any more I/O. So effectively update performance in thick index = update
performance in thin index (if indexed columns are not updated).
Hope i am clear..

What do you thick about not maintaining pins in case of thick indexes?

Thanks,
Gokul,
CertoSQL Project,
Allied Solution Groups.
(www.alliedgroups.com)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rainer Bauer 2007-10-23 12:02:03 Re: 8.2.3: Server crashes on Windows using Eclipse/Junit
Previous Message Rafael Martinez 2007-10-23 11:28:48 Re: PostgreSQL performance issues

Browse pgsql-patches by date

  From Date Subject
Next Message Heikki Linnakangas 2007-10-23 12:04:34 Re: [PATCHES] Including Snapshot Info with Indexes
Previous Message Heikki Linnakangas 2007-10-23 11:11:49 Re: [PATCHES] Including Snapshot Info with Indexes