Re: updating statistics on slow running query

From: desmodemone <desmodemone(at)gmail(dot)com>
To: Eric Ramirez <eric(dot)ramirez(dot)sv(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: updating statistics on slow running query
Date: 2014-11-10 18:57:08
Message-ID: CAEs9oFmom3AYrME3RkvRjnA3TO2L5LsY3O2mwD13a=F2mdJuSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

2014-11-10 18:43 GMT+01:00 Eric Ramirez <eric(dot)ramirez(dot)sv(at)gmail(dot)com>:

>
> Hi,
> I have created a sample database with test data to help benchmark our
> application. The database has ten million records, and is running on a
> dedicated server(postgres 9.3) with 8GB of RAM. Our queries are pretty
> slow with this amount of data and is my job to get them to run to at
> acceptable speed. First thing that I notice was that the planner's row
> estimates are off by a large number or records (millions) I have updated
> the statistics target but didn't seem to make a difference. The relevant
> output follows.
> Am I looking in the wrong place, something else I should be trying?
> Thanks in advance for your comments/suggestions,
> Eric.
>
>
> =# show work_mem;
> work_mem
> ----------
> 1GB
> (1 row)
> =# show effective_cache_size;
> effective_cache_size
> ----------------------
> 5GB
> (1 row)
>
> =#ALTER TABLE TAR_MVW_TARGETING_RECORD ALTER COLUMN
> household_member_first_name SET STATISTICS 5000;
> =# vacuum analyse TAR_MVW_TARGETING_RECORD;
>
> =# \d tar_mvw_targeting_record;
> Table "public.tar_mvw_targeting_record"
> Column | Type | Modifiers
> -----------------------------+-----------------------+-----------
> household_member_id | bigint |
> form_id | bigint |
> status | character varying(64) |
> gender | character varying(64) |
> household_member_first_name | character varying(64) |
> household_member_last_name | character varying(64) |
>
> Indexes:
> "tar_mvw_targeting_record_form_id_household_member_id_idx" UNIQUE,
> btree (form_id, household_member_id)
> "tar_mvw_targeting_record_lower_idx" gist
> (lower(household_member_first_name::text) extensions.gist_trgm_ops)
> WHERE status::text <> 'ANULLED'::text
> "tar_mvw_targeting_record_lower_idx1" gist
> (lower(household_member_last_name::text) extensions.gist_trgm_ops)
> WHERE status::text <> 'ANULLED'::text
>
>
> =# explain (analyse on,buffers on)select T.form_id from
> TAR_MVW_targeting_record AS T where T.status NOT IN ('ANULLED') AND
> LOWER(T.household_member_last_name) LIKE LOWER('%tu%') AND
> T.gender='FEMALE' group by T.form_id;
>
> QUERY PLAN
>
>
> -------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------
> HashAggregate (cost=450994.35..452834.96 rows=184061 width=8) (actual
> time=11932.959..12061.206 rows=442453 loops=1)
> Buffers: shared hit=307404 read=109743
> -> Bitmap Heap Scan on tar_mvw_targeting_record t
> (cost=110866.33..448495.37 rows=999592 width=8) (actual
> time=3577.301..11629.132 row
> s=500373 loops=1)
> Recheck Cond: ((lower((household_member_last_name)::text) ~~
> '%tu%'::text) AND ((status)::text <> 'ANULLED'::text))
> Rows Removed by Index Recheck: 9000079
> Filter: ((gender)::text = 'FEMALE'::text)
> Rows Removed by Filter: 499560
> Buffers: shared hit=307404 read=109743
> -> Bitmap Index Scan on tar_mvw_targeting_record_lower_idx1
> (cost=0.00..110616.43 rows=2000002 width=0) (actual time=3471.142..3
> 471.142 rows=10000012 loops=1)
> Index Cond: (lower((household_member_last_name)::text) ~~
> '%tu%'::text)
> Buffers: shared hit=36583 read=82935
> Total runtime: 12092.059 ms
> (12 rows)
>
> Time: 12093.107 ms
>
> p.s. this plan was ran three times, first time took 74 seconds.
>
>
>
Hello Eric,
did you try with gin index instead ? so you could
avoid, if possible, the recheck condition (almost the gin index is not
lossy ), further if you always use a predicate like "gender=" , you could
think to partition the indexes based on that predicate (where status NOT IN
('ANULLED') and gender='FEMALE', in the other case it wil be where status
NOT IN ('ANULLED') and gender='MALE' ) . Moreover you could avoid also the
"lower" operator and try use directly the ilike , instead of "like".

CREATE INDEX tar_mvw_targeting_record_idx02 ON
tar_mvw_targeting_record USING gin ( status gin_trgm_ops) where
status NOT IN ('ANULLED') and gender='FEMALE' ;
CREATE INDEX tar_mvw_targeting_record_idx03 ON
tar_mvw_targeting_record USING gin ( status gin_trgm_ops) where
status NOT IN ('ANULLED') and gender='MALE' ;

explain (analyse on,buffers on) select T.form_id from
TAR_MVW_targeting_record AS T where T.status NOT IN ('ANULLED') AND
T.household_member_last_name ilike LOWER('%tu%') AND T.gender='FEMALE'
group by T.form_id;

I hope it works

have a nice day

--
Matteo Durighetto

- - - - - - - - - - - - - - - - - - - - - - -

Italian PostgreSQL User Group <http://www.itpug.org/index.it.html>
Italian Community for Geographic Free/Open-Source Software
<http://www.gfoss.it>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2014-11-10 18:59:36 Re: Performance bug in prepared statement binding in 9.2?
Previous Message Josh Berkus 2014-11-10 18:56:42 Re: 9.3 performance issues, lots of bind and parse log entries