Re: WIP: Fast GiST index build

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Fast GiST index build
Date: 2011-08-16 12:04:37
Message-ID: 4E4A5CD5.2040601@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Looking at the calculation of levelStep:

> + /*
> + * Calculate levelStep by available amount of memory. We should be able to
> + * load into main memory one page of each underlying node buffer (which
> + * are in levelStep below). That give constraint over
> + * maintenance_work_mem. Also we should be able to have subtree of
> + * levelStep level in cache. That give constraint over
> + * effective_cache_size.
> + *
> + * i'th underlying level of sub-tree can consists of
> + * i^maxIndexTuplesPerPage pages at maximum. So, subtree of levelStep
> + * levels can't be greater then 2 * maxIndexTuplesPerPage ^ levelStep
> + * pages. We use some more reserve due to we probably can't take whole
> + * effective cache and use formula 4 * maxIndexTuplesPerPage ^ levelStep =
> + * effectiveCache. We use similar logic with maintenance_work_mem. We
> + * should be able to store at least last pages of all buffers where we are
> + * emptying current buffer to.
> + */
> + effectiveMemory = Min(maintenance_work_mem * 1024 / BLCKSZ,
> + effective_cache_size);
> + levelStep = (int) log((double) effectiveMemory / 4.0) /
> + log((double) maxIndexTuplesPerPage);
> +

I can see that that's equal to the formula given in the paper,
log_B(M/4B), but I couldn't see any explanation for that formula in the
paper. Your explanation makes sense, but where did it come from?

It seems a bit pessimistic: while it's true that the a subtree can't be
larger than 2 * maxIndexTuplesPerPage ^ levelStep, you can put a tighter
upper bound on it. The max size of a subtree of depth n can be
calculated as the geometric series:

r^0 + r^1 + r^2 + ... + r^n = (1 - r^(n + 1)) / (1 - r)

where r = maxIndexTuplesPerPage. For r=2 those formulas are equal, but
for a large r and small n (which is typical), the 2 *
maxIndexTuplesPerPage^levelStep formula gives a value that's almost
twice as large as the real max size of a subtree.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2011-08-16 12:25:29 non-ipv6 vs hostnames
Previous Message Andrew Dunstan 2011-08-16 11:23:48 Re: Re: Should we have an optional limit on the recursion depth of recursive CTEs?