Re: BUG #4204: COPY to table with FK has memory leak

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG #4204: COPY to table with FK has memory leak
Date: 2008-05-28 21:45:37
Message-ID: 1212011137.4489.700.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers


On Wed, 2008-05-28 at 16:28 -0400, Tom Lane wrote:
> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> >> This is expected to take lots of memory because each row-requiring-check
> >> generates an entry in the pending trigger event list.
>
> > Hm, it occurs to me that we could still do a join against the pending event
> > trigger list... I wonder how feasible it would be to store the pending trigger
> > event list in a temporary table instead of in ram.
>
> We could make that list spill to disk, but the problem remains that
> verifying the rows one at a time will take forever.
>
> The idea that's been kicked around occasionally is that once you get
> past N pending events, throw them all away and instead queue a single
> operation to do a bulk verify (just like initial establishment of the
> FK constraint). I'm not sure how to do the queue management for this
> though.

Neither of those approaches is really suitable. Just spilling to disk is
O(N) of the number of rows loaded, the second one is O(N) at least on
the number of rows (loaded + existing). The second one doesn't help
either since if the table was empty you'd have added the FK afterwards,
so we must assume there is already rows in there and in most cases rows
already loaded will exceed those being added by the bulk operation.

AFAICS we must aggregate the trigger checks. We would need a special
property of triggers that allowed them to be aggregated when two similar
checks arrived. We can then use hash aggregation to accumulate them. We
might conceivably need to spill to disk also, since the aggregation may
not always be effective. But in most cases the tables against which FK
checks are made are significantly smaller than the tables being loaded.
Once we have hash aggregated them, that is then the first part of a hash
join to the target table.

We certainly need a TODO item for "improve RI checks during bulk
operations".

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Gregory Stark 2008-05-28 22:17:23 Re: BUG #4204: COPY to table with FK has memory leak
Previous Message Gregory Stark 2008-05-28 20:47:35 Re: BUG #4204: COPY to table with FK has memory leak

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2008-05-28 22:17:23 Re: BUG #4204: COPY to table with FK has memory leak
Previous Message Simon Riggs 2008-05-28 21:20:39 Re: Avoiding second heap scan in VACUUM