Re: Queries on very big table

From: junior <drrtuy(at)yandex(dot)ru>
To: Job <job(at)colliniconsulting(dot)it>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Queries on very big table
Date: 2017-01-05 14:15:01
Message-ID: 5822551483625701@web14o.yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

It is better to use a data warehouse software with columnar storage(clickhouse, greenplum etc) for BI queries on large datasets but data offloading could be a complicated task. It is possible to try tune postgres and the environment it works in to process query such as yours faster. You should increase OS pagecache size adding RAM and tune readahead buffer size of block devices if you use linux.

02.01.2017, 14:29, "Job" <job(at)colliniconsulting(dot)it>:
> Hello guys and very good new year to everybody!
>
> We are now approaching some queries and statistics on very big table (about 180 millions of record).
> The table is partitioned by day (about ~3 Gb of data for every partition/day).
> We use Postgresql 9.6.1
>
> I am experiencing quite important slowdown on queries.
> I manually made a "vacuum full" and a "reindex" on every partition in order to clean free space and reorder records.
>
> I have a BRIN index on timestamp and index on other field (btree)
>
> Starting by a simple query: explain analyze select count(domain) from webtraffic_archive:
>
>                                                                                    QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Finalize Aggregate (cost=3220451.94..3220451.95 rows=1 width=8) (actual time=36912.624..36912.624 rows=1 loops=1)
>    -> Gather (cost=3220451.52..3220451.93 rows=4 width=8) (actual time=36911.600..36912.614 rows=5 loops=1)
>          Workers Planned: 4
>          Workers Launched: 4
>          -> Partial Aggregate (cost=3219451.52..3219451.53 rows=1 width=8) (actual time=36906.804..36906.804 rows=1 loops=5)
>                -> Append (cost=0.00..3094635.41 rows=49926443 width=0) (actual time=4.716..31331.229 rows=39853988 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive (cost=0.00..0.00 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_01_01 (cost=0.00..10.47 rows=47 width=0) (actual time=0.000..0.000 rows=0 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_16 (cost=0.00..213728.26 rows=3498026 width=0) (actual time=4.713..2703.458 rows=2798421 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_17 (cost=0.00..201379.39 rows=3247739 width=0) (actual time=6.334..2364.726 rows=2598191 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_18 (cost=0.00..176248.86 rows=2824986 width=0) (actual time=7.437..2014.812 rows=2259989 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_19 (cost=0.00..177493.33 rows=2866433 width=0) (actual time=9.951..2145.958 rows=2293146 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_20 (cost=0.00..120271.83 rows=1960883 width=0) (actual time=0.011..372.092 rows=1568706 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_21 (cost=0.00..276391.94 rows=4485294 width=0) (actual time=5.386..3111.589 rows=3588235 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_22 (cost=0.00..287611.68 rows=4630668 width=0) (actual time=6.598..3335.834 rows=3704535 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_23 (cost=0.00..249047.61 rows=4014361 width=0) (actual time=7.206..2628.884 rows=3211489 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_24 (cost=0.00..192008.70 rows=3097370 width=0) (actual time=9.870..1882.826 rows=2477896 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_25 (cost=0.00..87385.16 rows=1405616 width=0) (actual time=0.018..427.248 rows=1124493 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_26 (cost=0.00..88262.80 rows=1436080 width=0) (actual time=0.014..277.327 rows=1148864 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_27 (cost=0.00..222607.43 rows=3557243 width=0) (actual time=8.497..1232.210 rows=2845795 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_28 (cost=0.00..210414.76 rows=3365676 width=0) (actual time=0.033..548.878 rows=2692541 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_29 (cost=0.00..185065.72 rows=2955872 width=0) (actual time=0.031..498.079 rows=2364697 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_30 (cost=0.00..149139.55 rows=2382656 width=0) (actual time=0.011..501.351 rows=1906124 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2016_12_31 (cost=0.00..166991.89 rows=2664288 width=0) (actual time=0.041..437.631 rows=2131431 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2017_01_01 (cost=0.00..79197.29 rows=1260930 width=0) (actual time=0.018..254.124 rows=1008744 loops=5)
>                      -> Parallel Seq Scan on webtraffic_archive_day_2017_01_02 (cost=0.00..11378.74 rows=272274 width=0) (actual time=0.017..34.352 rows=130691 loops=5)
>  Planning time: 313.907 ms
>  Execution time: 36941.700 ms
>
> Other more complex queries are slower.
>
> How can i improve it?
> Records number can raise up until 1.000 millions.
> Do i need a third-part tool for big data?
>
> THANK YOU!
> /F
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2017-01-05 15:01:43 Re: COPY: row is too big
Previous Message Pavel Stehule 2017-01-05 13:19:56 Re: COPY: row is too big