Re: Query hitting empty tables taking 48 minutes

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Robert Creager <robert(at)logicalchaos(dot)org>, pgsql-general(at)postgresql(dot)org
Cc: Robert Creager <robertc(at)spectralogic(dot)com>
Subject: Re: Query hitting empty tables taking 48 minutes
Date: 2018-06-07 20:15:55
Message-ID: 1528402555.31386.27.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Robert Creager wrote:
> I have a system running FreeBSD 11.1-STABLE, PostgreSQL 9.6.8,Java OpenJDK 1.8.0_131,
> jdbc 9.3-1104-jdbc41 which is exhibiting very bizarre behavior.
> A query is executing against a couple of tables that have 1 and 0 records in them.
> ds3.job_entry has 0 rows, ds3.blob has 1 row.
> If I execute the query manually via command line, it executes fine. There are no other active queries, no locks.
> The system is idle, in our Dev Test group, so I can leave it this way for a bit of time.
> The general software setup is in production and I’ve seen nothing like this before.
> Even a system with 300M ds3.blob entries executes this query fine.
>
> Jun 7 17:24:21 blackpearl postgres[10670]: [7737-1] db=tapesystem,user=Administrator,app=[unknown],client=127.0.0.1 LOG: duration: 2903612.206 ms execute fetch from S_2037436/C_2037437: SELECT * FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.job_entry WHERE blob_id = ds3.blob.id AND (job_id = $1))
>
> tapesystem=# explain analyze SELECT * FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.job_entry WHERE blob_id = ds3.blob.id AND (job_id = 'b51357cd-e07a-4c87-a50b-999c347a5c71'));
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=0.84..6.89 rows=1 width=77) (actual time=0.044..0.044 rows=0 loops=1)
> -> Index Scan using job_entry_job_id_idx on job_entry (cost=0.42..2.44 rows=1 width=16) (actual time=0.042..0.042 rows=0 loops=1)
> Index Cond: (job_id = 'b51357cd-e07a-4c87-a50b-999c347a5c71'::uuid)
> -> Index Scan using blob_pkey on blob (cost=0.42..4.44 rows=1 width=77) (never executed)
> Index Cond: (id = job_entry.blob_id)
> Planning time: 0.388 ms
> Execution time: 0.118 ms
>
>
> pid client_port runtime query_start datname state query usename
> 10670 11211 0 years 0 mons 0 days 0 hours 43 mins 28.852273 secs 2018-06-07 17:24:22.026384 tapesystem active SELECT * FROM ds3.blob WHERE EXISTS (SELECT * FROM ds3.job_entry WHERE blob_id = ds3.blob.id AND (job_id = $1)) Administrator
>
>
> From the system with 300M ds3.blob entries, which works fine (along with every other system in house):
>
> QUERY PLAN
> Nested Loop (cost=0.57..738052.90 rows=164379 width=75) (actual time=1.001..1947.029 rows=50000 loops=1)
> -> Seq Scan on job_entry (cost=0.00..10039.50 rows=164379 width=16) (actual time=0.871..56.442 rows=50000 loops=1)
> Filter: (job_id = 'ef27d2fa-2727-424e-8f44-da9e33a5ca05'::uuid)
> Rows Removed by Filter: 60001
> -> Index Scan using blob_pkey on blob (cost=0.57..4.42 rows=1 width=75) (actual time=0.037..0.037 rows=1 loops=50000)
> Index Cond: (id = job_entry.blob_id)
> Planning time: 6.330 ms
> Execution time: 1951.564 ms

The JDBC query is using a cursor since you are using setFetchSize(10000).

I can see two reasons why the plan might be different:

1. It is a prepared statement that has been executed often enough
for the generic plan to kick in, and the generic plan is bad.

2. PostgreSQL chooses a fast startup plan because a cursor is used,
and that plan performs much worse.

To see if 1. causes the problem, you could set
m_statement.setPrepareThreshold(0);
and see if that makes a difference.

To test if 2. has an influence, try setting cursor_tuple_fraction to 1.0
and see if it makes a difference.

You can capture the execution plan of the bad statement by using auto_explain,
that would certainly shed more light on the problem.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2018-06-07 20:57:36 Code of Conduct committee: call for volunteers
Previous Message Robert Creager 2018-06-07 19:26:05 Re: Query hitting empty tables taking 48 minutes