Re: BUG #4204: COPY to table with FK has memory leak

From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG #4204: COPY to table with FK has memory leak
Date: 2008-05-29 06:44:34
Message-ID: 1212043474.7129.3.camel@huvostro
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Wed, 2008-05-28 at 22:45 +0100, Simon Riggs wrote:
> On Wed, 2008-05-28 at 16:28 -0400, Tom Lane wrote:
> > Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > > "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> > >> This is expected to take lots of memory because each row-requiring-check
> > >> generates an entry in the pending trigger event list.
> >
> > > Hm, it occurs to me that we could still do a join against the pending event
> > > trigger list... I wonder how feasible it would be to store the pending trigger
> > > event list in a temporary table instead of in ram.
> >
> > We could make that list spill to disk, but the problem remains that
> > verifying the rows one at a time will take forever.
> >
> > The idea that's been kicked around occasionally is that once you get
> > past N pending events, throw them all away and instead queue a single
> > operation to do a bulk verify (just like initial establishment of the
> > FK constraint). I'm not sure how to do the queue management for this
> > though.
>
> Neither of those approaches is really suitable. Just spilling to disk is
> O(N) of the number of rows loaded, the second one is O(N) at least on
> the number of rows (loaded + existing). The second one doesn't help
> either since if the table was empty you'd have added the FK afterwards,
> so we must assume there is already rows in there and in most cases rows
> already loaded will exceed those being added by the bulk operation.
>
> AFAICS we must aggregate the trigger checks. We would need a special
> property of triggers that allowed them to be aggregated when two similar
> checks arrived. We can then use hash aggregation to accumulate them. We
> might conceivably need to spill to disk also, since the aggregation may
> not always be effective.

Can't we just do the checks for the FKs accumulated at the point they
don't fit in memory, instead of spilling to disk ?

> But in most cases the tables against which FK
> checks are made are significantly smaller than the tables being loaded.
> Once we have hash aggregated them, that is then the first part of a hash
> join to the target table.
>
> We certainly need a TODO item for "improve RI checks during bulk
> operations".

Agreed.

----------------
Hannu

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Jeferson Kasper 2008-05-29 07:01:46 BUG #4207: EXTRACT function
Previous Message Simon Riggs 2008-05-29 05:41:44 Re: BUG #4204: COPY to table with FK has memory leak

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2008-05-29 06:57:53 Re: intercepting WAL writes
Previous Message Simon Riggs 2008-05-29 05:45:37 Re: Avoiding second heap scan in VACUUM