Re: Hash partitioning.

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Yuri Levinsky <yuril(at)celltick(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Mailing Lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash partitioning.
Date: 2013-06-27 13:05:58
Message-ID: 51CC38B6.5000908@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/27/2013 11:12 AM, Nicolas Barbier wrote:
> Imagine that there are a lot of indexes, e.g., 50. Although a lookup
> (walking one index) is equally fast, an insertion must update al 50
> indexes. When each index requires one extra I/O (because each index is
> one level taller), that is 50 extra I/Os. In the partitioned case,
> each index would require the normal smaller amount of I/Os. Choosing
> which partition to use must only be done once: The result “counts” for
> all indexes that are to be updated.

I think you're underestimating the cost of partitioning. After all, the
lookup of what index to update for a given partition is a a lookup in
pg_index via pg_index_indrelid_index - a btree index.

Additionally, the depth of an index doesn't directly translate to the
number of I/O writes per insert (or delete). I'd rather expect the avg.
number of I/O writes per insert into a b-tree to be reasonably close to
one - depending mostly on the number of keys per page, not depth.

> Additionally: Imagine that the data can be partitioned along some
> column that makes sense for performance reasons (e.g., some “date”
> where most accesses are concentrated on rows with more recent dates).
> The other indexes will probably not have such a performance
> distribution. Using those other indexes (both for look-ups and
> updates) in the non-partitioned case, will therefore pull a huge
> portion of each index into cache (because of the “random distribution”
> of the non-date data). In the partitioned case, more cache can be
> spent on the indexes that correspond to the “hot partitions.”

That's a valid point, yes. I'd call this index partitioning. And with
partial indices, Postgres already has something that gets pretty close,
I think. Though, I don't consider this to be related to how the tuples
of the relation are laid out on disk.

Regards

Markus Wanner

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-06-27 13:08:31 Re: extensible external toast tuple support & snappy prototype
Previous Message Robert Haas 2013-06-27 13:02:39 Re: [PATCH] add long options to pgbench (submission 1)