Re: Hash partitioning.

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Yuri Levinsky <yuril(at)celltick(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Mailing Lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash partitioning.
Date: 2013-06-27 19:40:34
Message-ID: 51CC9532.2010807@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/27/2013 06:35 PM, Nicolas Barbier wrote:
> I am assuming that this (comparatively very small and super-hot) index
> is cached all the time, while for the other indexes (that are
> supposedly super-huge) only the top part stays cached.
>
> I am mostly just trying to find out where Yuri’s “partitioning is
> needed for super-huge tables” experience might come from, and noting
> that Heikki’s argument might not be 100% valid.

I think the OP made that clear by stating that his index has relatively
low selectivity. That seems to be a case that Postgres doesn't handle
very well.

> I think that the
> “PostgreSQL-answer” to this problem is to somehow cluster the data on
> the “hotness column” (so that all hot rows are close to one another,
> thereby improving the efficiency of the caching of relation blocks) +
> partial indexes for the hot rows (as first mentioned by Claudio; to
> improve the efficiency of the caching of index blocks).

Agreed, sounds like a sane strategy.

> My reasoning was: To determine which index block to update (typically
> one in both the partitioned and non-partitioned cases), one needs to
> walk the index first, which supposedly causes one additional (read)
> I/O in the non-partitioned case on average, because there is one extra
> level and the lower part of the index is not cached (because of the
> size of the index). I think that pokes a hole in Heikki’s argument of
> “it really doesn’t matter, partitioning == using one big table with
> big non-partial indexes.”

Heikki's argument holds for the general case, where you cannot assume a
well defined hot partition. In that case, the lowest levels of all the
b-trees of the partitions don't fit in the cache, either. A single index
performs better in that case, because it has lower overhead.

I take your point that in case you *can* define a hot partition and
apply partitioning, the hot(test) index(es) are more likely to be cached
and thus require less disk I/O.

Regards

Markus Wanner

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2013-06-27 19:51:15 Re: updated emacs configuration
Previous Message Noah Misch 2013-06-27 19:12:30 Re: MemoryContextAllocHuge(): selectively bypassing MaxAllocSize