Re: Best practices to manage custom statistics

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Moreno Andreo <moreno(dot)andreo(at)evolu-s(dot)it>, PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: Best practices to manage custom statistics
Date: 2016-11-08 20:28:00
Message-ID: 0cc5b1e2-ebd9-2bde-253a-701a9ba571be@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 11/08/2016 12:13 PM, Moreno Andreo wrote:
> Hi,
> I'm checking if there's a best way to obtain stastistics based on my
> database tables
>
> Here's the scenario.
> First of all, technical details:
> - Postgresql 9.1, Ubuntu 12 on a 4 core, 32 GB machine with 600 GB disk
> migrating to Postgresql 9.5.3, Debian 8 on a 8-core, 52 GB machine with
> 2 TB disk.
> - 350 databases, 350 users, every user connects to his own database and
> his teammates' (max 10 in total) so each user can connect to max 10
> databases at a time
>
>
> My application needs to achieve a certain number of statistics (how many
> records are in a certain state, how many are in another state) to send
> back to user.
> This is obtained, at the moment, with a select count(*) from ..... (that
> involves 4 joins on 4 tables) to be run run every 20 secs from each
> client connected to the cluster (ATM about 650 clients configured, about
> 200 concurrent) to each database it has rights to connect.
>
> I noticed that in some cases, especially when working with not-so-small
> datasets (200k rows x 95 cols), and sometines returning not-so-small
> datasets (10k rows) the query performs not so well, but the worst thing
> is that it raises overall server load (I/O) and bandwidth usage. While
> bandwidth is not a problem (I have spikes at 20 Mbps while "normal"
> traffic speed is at about 3Mbps, but I have 1 Gbps available), server
> load *is* a main problem, because in high-access periods
> (summer/holidays) I see my server load go up to 22-25 on a 4-core
> machine, and users call complaining for timeouts and slowness.
>
> Even if I'm migrating to a better instance, I'm still trying to
> "normalize" this feature.
> I can start looking at indices (I'm not quite sure that those fields in
> WHERE clause are all indexed), but I don't think it would boost its
> performance.
>
> I thought about having a table, say, 'tbl_counters', like this
>
> CREATE TABLE tbl_counters{
> uuid coduser,
> int counter1,
> int counter2,
> ....
> int counterx
> };
> updated by trigger (when a value in a table is INSERTed/UPDATEd/DELETEd
> it fires a function that increments/decrements values for counter x at
> user y).
> Just to avoid eventual trigger skipping its update, one time a day (say
> at 2 am) a cron performs the above "monster query" for every database
> and adjusts, if necessary, counter values for each user reflecting real
> values.
>
> In your experience, would this approach help me lower server load?
> Are there any other approach I can try?

Instead of pushing why not pull. In other words do the users really
check/need the statistics every 20 secs? Given that you say exact is not
important over the course of day, why not create a mechanism for the
user to poll the database when they need the information.

>
> If more details are needed, just ask.
>
> Thanks in advance and sorry for the long message (but I had to explain
> such a complex thing)
> Moreno.-
>
>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ryan Mahoney 2016-11-08 20:41:44 Re: Running on Docker, AWS with Data Stored on EBS
Previous Message otar shavadze 2016-11-08 20:27:47 Gin indexes on intarray is fast when value in array does not exists, and slow, when value exists