Re: Reindex taking forever, and 99% CPU

From: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: John R Pierce <pierce(at)hogranch(dot)com>, PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Reindex taking forever, and 99% CPU
Date: 2014-08-07 01:52:03
Message-ID: CAFWfU=t7t8rxNnvWN4ByhHq44O1LtCBnU_upYEW-jco1Fw_m5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thank you for the very specific idea of pg_stat_user.

This is what I see (the output is also included in email below, but
this is easier to read) --
https://gist.github.com/anonymous/53f748a8c6c454b804b3

The output here (might become a jumbled mess)--

=# SELECT * from pg_stat_user_tables where relname='bigtb';
relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd |
n_live_tup | n_dead_tup | last_vacuum |
last_autovacuum | last_analyze | last_autoanalyze
--------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+-------------------------------+-----------------+-------------------------------+------------------
105954 | public | bigtb | 0 | 0 | 220396 |
89781 | 63516 | 6 | 910 | 1 |
634879579 | 39 | 2014-08-06 20:12:47.163055-04 |
| 2014-08-06 20:19:40.317771-04 | (1 row)

Time: 50.844 ms

We spent some time to do some massive cleaning of the data from this
table. Brought it down to around 630 million rows. Overall size of the
table including indexes is about 120GB anyway.

More stats that we could manage are pretty-pasted here:
https://gist.github.com/anonymous/21aaeae10584013c3820

The biggest table ("bigtb" -- codename for pasting on public forum)
stores some URLs. The most important index is for this table is the
"alias" column, which is varchar(35) as you can see.

Table definition also pasted below:

Table "public.bigtb"
Column | Type | Modifiers
-----------------+-----------------------------+---------------------------------
alias | character varying(35) | not null
url | text | not null
user_registered | boolean |
private_key | character varying(6) | default NULL::character varying
modify_date | timestamp without time zone | default now()
ip | bigint |
url_md5 | text |

Indexes:
"idx_bigtb_pkey" PRIMARY KEY, btree (alias)
"idx_bigtb_ip_url" UNIQUE, btree (ip, url_md5)
"idx_bigtb_modify_date" btree (modify_date)
"idx_bigtb_urlmd5" btree (url_md5)
Check constraints:
"bigtb_alias_check" CHECK (alias::text ~ '[-.~a-z0-9_]'::text)
Referenced by:
TABLE "bigtb_registered" CONSTRAINT "fk_bigtb_registered" FOREIGN
KEY (alias) REFERENCES bigtb(alias) MATCH FULL ON UPDATE CASCADE ON
DELETE CASCADE
TABLE "interesting" CONSTRAINT "interesting_alias_fkey" FOREIGN
KEY (alias) REFERENCES bigtb(alias) MATCH FULL ON UPDATE CASCADE
Rules:
__track_bigtb_deleted AS
ON DELETE TO bigtb
WHERE NOT (EXISTS ( SELECT bigtb_deleted.alias
FROM bigtb_deleted
WHERE bigtb_deleted.alias::text = old.alias::text)) DO
INSERT INTO bigtb_deleted (alias, url, user_registered, modify_date)
VALUES (old.alias, old.url, old.user_registered, old.modify_date)

What else could I do here?

As you will see in the code shared above (GIST Github link) the stats
for this table are:

bigtb -

row count: 634,879,168
inserted: 65613
updated: 6
deleted: 1013

There are recent numbers. The DB has been going down often. But
deletions would be around 20,000 per week. Updates are lowest. INSERT
and SELECT are huge, with of course SELECT being the biggest activity
(high traffic website).

We did put PGBouncer for some pooling benefits, and "memcached" for
taking some load off the postgresql server. As of this writing, the
memcached thing is caching around 200,000 URLs which would otherwise
have been a query based on the index on the "alias" column --
"idx_bigtb_pkey".

What other info can I share?

Suppose we might have to explore partitioning, which would probably be
via first letter of the alias? This would lead to around 26 + 9 = 35
sub-tables. Is this too many?

My CONFIG settings:

max_connections = 180 # Was 250! -
http://www.php.net/manual/en/function.pg-pconnect.php#20309
superuser_reserved_connections = 5
shared_buffers = 512MB
effective_cache_size = 1200MB # Nov 11 2011, was 1500MB
temp_buffers = 32MB # min 800kB
maintenance_work_mem = 320MB # min 1MB, was 128MB
work_mem = 64MB
wal_buffers = 20MB # min 32kB
fsync = on # turns forced
synchronization on or off
checkpoint_segments = 128 # was 128
checkpoint_timeout = 1000 # was 1000
enable_indexscan = on
log_min_duration_statement = 1000

Much appreciate any further ideas!

On Sun, Aug 3, 2014 at 9:29 PM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> wrote:
> On 08/02/2014 07:37 PM, Phoenix Kiula wrote:
>>>
>>> In your original post you said it was stopping on pg_class so now I am
>>> confused.
>>
>>
>>
>>
>> No need to be confused. The vacuum thing is a bit tricky for laymen
>> like myself. The "pg_class" seemed to be associated to this table.
>> Anyway, even before the upgrade, the vacuum was stopping at this table
>> and taking forever.
>>
>> The question is: what now. Where can I give you information from?
>> IOSTAT I've already shared.
>>
>> Will the work_mem settings affect the manual REINDEX that's still
>> running? What can I do to speed up the REINDEX? Should I change my
>> autovacuum settings for this table specifcally (it's the only mammoth
>> table in the DB, and our main one)?
>
>
> Adding to my previous post, some information from the statistic collector
> would be useful. See here for more information:
>
> http://www.postgresql.org/docs/9.0/static/monitoring-stats.html#MONITORING-STATS-VIEWS-TABLE
>
> For now the output of:
>
> SELECT * from pg_stat_user_tables where relname='your_table_name';
>
> might prove helpful.
>
>
>>
>> Thanks.
>>
>>
>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David G Johnston 2014-08-07 01:52:48 Re: Need help in tuning
Previous Message Peter Geoghegan 2014-08-07 01:46:49 Re: The dangers of streaming across versions of glibc: A cautionary tale