Re: Query hitting empty tables taking 48 minutes

From: Robert Creager <robert(at)logicalchaos(dot)org>
To: Robert <domain(at)logicalchaos(dot)org>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-general(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Query hitting empty tables taking 48 minutes
Date: 2018-06-18 18:20:42
Message-ID: D36E44EC-C3DC-4524-BC2C-8FAECD123F60@logicalchaos.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> On Jun 7, 2018, at 4:18 PM, Robert <domain(at)logicalchaos(dot)org> wrote:
>
>> You can capture the execution plan of the bad statement by using auto_explain,
>> that would certainly shed more light on the problem.
>

A different query started showing up as the problem, the auto_explain with analyze shows an oddity, the total query duration is 11k seconds, while the query itself is 3 seconds. I captured a ZFS snapshot as soon as the problem was noticed.

Jun 16 23:15:30 blackpearl postgres[9860]: [79-1] db=tapesystem,user=Administrator,app=[unknown],client=127.0.0.1 LOG: duration: 10856644.336 ms plan:
Jun 16 23:15:30 blackpearl postgres[9860]: [79-2] Query Text: DELETE FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.s3_object WHERE id = ds3.blob.object_id AND (bucket_id = $1))
Jun 16 23:15:30 blackpearl postgres[9860]: [79-3] Delete on blob (cost=1308.79..1312.82 rows=1 width=12) (actual time=3465.919..3465.919 rows=0 loops=1)
Jun 16 23:15:30 blackpearl postgres[9860]: [79-4] -> Nested Loop (cost=1308.79..1312.82 rows=1 width=12) (actual time=50.293..2435.271 rows=300000 loops=1)
Jun 16 23:15:30 blackpearl postgres[9860]: [79-5] -> Bitmap Heap Scan on s3_object (cost=634.39..636.41 rows=1 width=22) (actual time=50.269..153.885 rows=300000 loops=1)
Jun 16 23:15:30 blackpearl postgres[9860]: [79-6] Recheck Cond: (bucket_id = 'bc6e6b10-80ad-4329-9fb9-1a66d8c1505e'::uuid)
Jun 16 23:15:30 blackpearl postgres[9860]: [79-7] Heap Blocks: exact=3704
Jun 16 23:15:30 blackpearl postgres[9860]: [79-8] -> Bitmap Index Scan on ds3_s3_object__bucket_id (cost=0.00..634.39 rows=1 width=0) (actual time=49.552..49.552 rows=300000 loops=1)
Jun 16 23:15:30 blackpearl postgres[9860]: [79-9] Index Cond: (bucket_id = 'bc6e6b10-80ad-4329-9fb9-1a66d8c1505e'::uuid)
Jun 16 23:15:30 blackpearl postgres[9860]: [79-10] -> Bitmap Heap Scan on blob (cost=674.39..676.41 rows=1 width=22) (actual time=0.005..0.006 rows=1 loops=300000)
Jun 16 23:15:30 blackpearl postgres[9860]: [79-11] Recheck Cond: (object_id = s3_object.id)
Jun 16 23:15:30 blackpearl postgres[9860]: [79-12] Heap Blocks: exact=300000
Jun 16 23:15:30 blackpearl postgres[9860]: [79-13] -> Bitmap Index Scan on ds3_blob__object_id (cost=0.00..674.39 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=300000)
Jun 16 23:15:30 blackpearl postgres[9860]: [79-14] Index Cond: (object_id = s3_object.id)

Doing a ZFS rollback and executing the query shows what is happening, although not to the extent above. If I read this correctly, it’s the constraint checks that are causing the query to take so long. I don’t see any server configuration that might allow those checks to be parallelized. Is that possible?

tapesystem=# PREPARE foo(uuid) AS DELETE FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.s3_object WHERE id = ds3.blob.object_id AND (bucket_id = $1)); EXPLAIN ANALYZE EXECUTE foo('bc6e6b10-80ad-4329-9fb9-1a66d8c1505e');
PREPARE
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Delete on blob (cost=9555.07..21134.01 rows=299993 width=12) (actual time=1516.140..1516.140 rows=0 loops=1)
-> Hash Join (cost=9555.07..21134.01 rows=299993 width=12) (actual time=237.816..621.306 rows=300000 loops=1)
Hash Cond: (s3_object.id = blob.object_id)
-> Seq Scan on s3_object (cost=0.00..7454.04 rows=299993 width=22) (actual time=0.027..148.503 rows=300000 loops=1)
Filter: (bucket_id = 'bc6e6b10-80ad-4329-9fb9-1a66d8c1505e'::uuid)
Rows Removed by Filter: 3
-> Hash (cost=5805.03..5805.03 rows=300003 width=22) (actual time=235.219..235.219 rows=300003 loops=1)
Buckets: 524288 Batches: 1 Memory Usage: 19917kB
-> Seq Scan on blob (cost=0.00..5805.03 rows=300003 width=22) (actual time=0.038..114.107 rows=300003 loops=1)
Trigger for constraint blob_tape_blob_id_fkey: time=5389.627 calls=300000
Trigger for constraint multi_part_upload_placeholder_blob_id_fkey: time=4566.305 calls=300000
Trigger for constraint multi_part_upload_part_blob_id_fkey: time=3597.896 calls=300000
Trigger for constraint blob_pool_blob_id_fkey: time=4631.851 calls=300000
Trigger for constraint blob_target_blob_id_fkey: time=4688.731 calls=300000
Trigger for constraint suspect_blob_tape_blob_id_fkey: time=4473.787 calls=300000
Trigger for constraint suspect_blob_pool_blob_id_fkey: time=4475.286 calls=300000
Trigger for constraint suspect_blob_target_blob_id_fkey: time=4353.237 calls=300000
Trigger for constraint blob_s3_target_blob_id_fkey: time=4451.687 calls=300000
Trigger for constraint blob_azure_target_blob_id_fkey: time=4448.613 calls=300000
Trigger for constraint suspect_blob_azure_target_blob_id_fkey: time=4353.467 calls=300000
Execution time: 85175.581 ms

I executed a vacuum analyze, then a vacuum full analyze, neither changed the timing. Other than removing constraints, is there any way to address this?

Best,
Robert

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2018-06-18 20:03:45 Re: What to do when dynamic shared memory control segment is corrupt
Previous Message Andres Freund 2018-06-18 17:28:43 Re: What to do when dynamic shared memory control segment is corrupt