On 8 August 2012 03:44, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Tue, Aug 7, 2012 at 1:52 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On 7 August 2012 20:58, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>> Hi Heikki,
>>> Is the bulk index insert still an active area for you?
>>> If not, is there some kind of summary of design or analysis work
>>> already done, which would be a useful for someone else interested in
>>> picking it up?
>> The main cost comes from repeated re-seeking down the index tree to
>> find the insertion point, but repeated lock and pin operations on
>> index buffers are also high. That is optimisable if the index inserts
>> are grouped, as they will be with monotonic indexes, (e.g. SERIAL), or
>> with partial monotonic (i.e. with Parent/Child relationship, on Child
>> table many inserts will be of the form (x,1), (x,2), (x, 3) etc, e.g.
>> All we need do is buffer the inserts in the inserts, before inserting
>> them into the main index. As long as we flush the buffer before end of
>> transaction, we're good.
>> Incidentally, we can also optimise repeated inserts within a normal
>> transaction using this method, by implementing deferred unique
>> constraints. At present we say that unique constraints aren't
>> deferrable, but there's no reason they can't be, if we allow buffering
>> in the index. (Implementing a deferred constraint that won't fail if
>> we do UPDATE foo SET pk = pk+1 requires a buffer the size of foo,
>> which is probably a bad plan, plus you'd need to sort the inputs, so
>> that particular nut is another issue altogether, AFAICS).
>> Suggested implementation is to buffer index tuples at the generic
>> index API, then implement a bulk insert index API call that can then
>> be implemented for each AM separately. Suggested buffer size is
>> work_mem. Note we must extract index tuple from heap tuples -
>> refetching heap blocks to get rows is too costly.
> OK, thanks for the summary. It looks like your plans are to improve
> the case where the index maintenance is already CPU limited. I was
> more interested in cases where the regions of the index(es) undergoing
> active insertion do not fit into usable RAM, so the limit is the
> random IO needed to fetch the leaf pages in order to update them or to
> write them out once dirtied. Here too buffering is probably the
> answer, but the size of the buffer needed to turn that IO from
> effectively random to effectively sequential is probably much larger
> than the size of the buffer you are contemplating.
The buffer size can be variable, yes. I was imagining a mechanism that
worked for normal INSERTs as well as COPY. Perhaps we could say buffer
is work_mem with INSERT and maintenance_work_mem with COPY.
Very large index appends are useful, but currently not very easily
usable because of the transactional nature of COPY. If we could reject
rows without ERROR it would be more practical.
I'm not planning to work on this, so all comments for your assistance.
>> I think we need to implement buffering both to end of statement or end
>> of transaction, not just one or the other.
> With the planner deciding which would be better, or explicit user action?
Probably both: on/off/choose.
Deferring unique check would change the point at which errors were
reported in a transaction, which might not be desirable for some. I
think SQL standard has something to say about this also, so that needs
care. But in general, if your tables use generated PK values they
should be able to benefit from this, so I would suggest a default
setting of choose.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
In response to
pgsql-hackers by date
|Next:||From: Jim Vanns||Date: 2012-08-08 09:01:33|
|Subject: Bug in libpq implentation and omission in documentation?|
|Previous:||From: Pavel Stehule||Date: 2012-08-08 03:09:54|
|Subject: Re: WIP: pg_pretty_query|