Re: Index seems "lost" after consecutive deletes

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Edson Richter <edsonrichter(at)hotmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Index seems "lost" after consecutive deletes
Date: 2016-06-14 13:32:15
Message-ID: CAKFQuwYevP=-=jy8pDU3FJ6pXFNSr5boGAjiKvDqfqy0fF+SOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Jun 14, 2016 at 12:51 AM, Edson Richter <edsonrichter(at)hotmail(dot)com>
wrote:

> Em 14/06/2016 01:33, David G. Johnston escreveu:
>
> On Monday, June 13, 2016, Edson Richter < <edsonrichter(at)hotmail(dot)com>
> edsonrichter(at)hotmail(dot)com> wrote:
>
>> Em 13/06/2016 23:36, Edson Richter escreveu:
>>
>> Em 13/06/2016 23:18, rob stone escreveu:
>>
>> On Mon, 2016-06-13 at 22:41 -0300, Edson Richter wrote:
>>
>> Em 13/06/2016 22:33, Edson Richter escreveu:
>>
>> I've a table "A" with 4,000,000 records.
>>
>> I've decided to delete records from oldest to newest but I can't
>> delete records that have references in tables "B", "C" or "D".
>>
>>
>> so, I've
>>
>>
>> with qry as (
>>
>> select A.id
>>
>> from A
>>
>> where not exists (select 1 from B where B.a_id = A.id)
>>
>> and not exists (select 1 from C where C.a_id = A.id)
>>
>> and not exists (select 1 from D where D.a_id = A.id)
>>
>> and A.creation_date < (now()::date - interval '12 month')
>>
>> order by A.id DESC
>>
>> limit 2000
>>
>> )
>>
>> delete from A where id in (select id from qry);
>>
>>
>> All three referenced tables have indexes (B.a_id; C.a_id; D.a_id)
>> in
>> order to make query faster.
>>
>> So for first 2 million rows it worked really well, taking about 1
>> minute to delete each group of 2000 records.
>>
>> Then, after a while I just started to get errors like:
>>
>>
>> ​​
>> ERROR: update or delete in "A" violates foreign key
>> ​​
>> "fk_C_A" in
>> "C".
>>
>> DETAIL: Key (id)=(3240124) is still referenced by table "C".
>>
>>
>> Seems to me that indexes got lost in the path - the query is
>> really
>> specific and no "C" referenced records can be in my deletion.
>>
>> Has anyone faced a behavior like this?
>>
>> Am I doing something wrong?
>>
>>
>>
>> Of course:
>> Version string PostgreSQL 9.4.8 on x86_64-unknown-linux-gnu,
>> compiled
>> by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4), 64-bit
>> Oracle Linux 7 x64 with all updates. Running on EXT4 file system.
>> Computer is Dell R420 with mirrored disks, 80GB of RAM (database has
>> <
>> 40GB in total).
>>
>> Sorry for not putting the info in the first e-mail.
>>
>> Edson
>>
>>
>>
>> What does:-
>>
>> SELECT COUNT(*) FROM C WHERE C.a_id = 3240124;
>>
>> return?
>>
>> Is it a many-to-one or a one-to-one relationship?
>>
>>
>>
>>
>> SELECT COUNT(*) FROM C WHERE C.a_id = 3240124;
>>
>> count
>> -------
>> 1
>> (1 registro)
>>
>>
>> A.id is primary key of A table. Each table has its own primary key.
>>
>> Relationship to others table is 1-N, being N = {0,1}
>>
>>
>> A.id -> B.a_id (being B.a_id unique but not enforced by unique key)
>>
>> A.id -> C.a_id (being C.a_id unique but not enforced by unique key)
>>
>> A.id -> D.a_id (being D.a_id unique but not enforced by unique key)
>>
>>
>> Regards,
>>
>> Edson
>>
>>
>> Just in case, I've run:
>>
>> - vacuum full analyze verbose;
>> - reindex index ix_c_a_id;
>>
>> Result I get same error. So, I'm inclined to discard that this is a index
>> error.
>>
>>
>> Interesting:
>>
>> with qry as (select A.id
>> from A
>> where creatingdate < (now()::date - interval '12 month')
>> and not exists (select 1 from B where B.a_id = A.id)
>> and not exists (select 1 from C where C.a_id = A.id)
>> and not exists (select 1 from D where D.a_id = A.id)
>> order by A.id limit 2000)
>>
>> select * from qry where id = 3240124;
>>
>
> Why do you assume 3240124 is within the first 2000 qualified records that
> the CTE is limited to checking?
>
>
> Because this is the key causing the error on delete.
>

​Wasn't the point though I do suspect your expectations are acceptable in
this instance. If you truly want to see if qry contains 3240124 you should
lose the LIMIT 2000.​

>
>
>
>>
>>
>> Total query runtime: 2.2 secs
>> 0 rows retrieved.
>>
>>
>> Why delete causes error, but querying don't?
>>
>
> Given the error message this answer seems self-evident...
>
>
> Not self-evident to me: delete says I'm trying to delete the record with
> id = 3240124 and I can't. But the select says this record with id = 3240124
> is not there!!!
>
>
​This error:
​ERROR: update or delete in "A" violates foreign key "fk_C_A" in
​ ​
"C".
is impossible to encounter when executing a pure select...

>
>>
>> Would it be a bug when using delete ... where id in (subquery)?
>>
>>
>
> I'm unsure regarding the solution but I suspect the problem is that
> between selecting the A row and deleting it another concurrent process
> added a record to C that, if you were to re-run the select would cause the
> row from A to be skipped. But the single query doesn't have that option so
> it ends up failing.
>
> There is a FOR UPDATE clause you can add to the select but I don't think
> that works here since table C is the one being altered and at the time of
> the query there is nothing to lock.
>
> I'm doubting this is a bug, just poor concurrency understanding. Sorry I
> cannot enlighten further at the moment.
>
>
> There is not other process adding/updating records - and even there is,
> the ID would be so high (because it is a sequence) that cannot be the
> 3240124 (this record has been generated more than 2 years ago - that's why
> my query has the "creationdate" filter - I don't touch records created
> within last 12 months).
>
> Also, I've tried the same with "for update": same error!
>
>
​Have you confirmed that

"fk_C_A" is referencing the columns you think it is?

​What's the history of this machine? Did you pass through 9.3 (especially
early releases) on your way to 9.4?

David J.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alexander Farber 2016-06-14 13:33:01 How to pass jsonb and inet arguments to a stored function with JDBC?
Previous Message Kevin Grittner 2016-06-14 12:59:04 Re: Sequences, txids, and serial order of transactions