Re: Which partition scheme makes sense for my time based IoT-datagrams?

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Thorsten Schöning <tschoening(at)am-soft(dot)de>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Which partition scheme makes sense for my time based IoT-datagrams?
Date: 2021-02-10 16:45:59
Message-ID: 37cf6c0529f818c9b94f8d71277793abcef15ff2.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 2021-02-10 at 16:09 +0100, Thorsten Schöning wrote:
> I have a table storing datagrams from some IoT-devices, with one
> datagram per device per day most likely for around 75'000 devices
> currently. I want to test query performance with a partitioned table
> and am interested in the following queries mostly:
>
> * querying arbitrary datagrams by their ID
> * querying datagrams being X old based on some timestamp
> * 15 minutes
> * 15 days
> * 15 months

You want to query ALL individual datagrams from the past 15 months??

I guess not, and that you want some aggregation. For that, materialized
views might be interesting.

> Therefore I thought of simply partitioning by RANGE of the timestamp
> when the datagram has been received and create individual partitions
> per half a year. That means that in most cases only the most current
> 1 partition needs to be queried, with the last 3 in many other cases.
>
> My approach would result in 24 tables already, with 2 more per year.
> Does that for itself sound bad already? Is that a limit the planner
> can deal with most likely or do I don't even need to care for hundreds
> or partitions?

24 partitions is harmless. I think that a partition per half year might
not be enough, but it depends on the queries.

You should use PostgreSQL v12 or better.

One point you didn't touch, but should consider, is getting rid of old data.
You will want to do that at some point, or at least you want to keep only
aggregates of the data for historical statistical data.

That would be very easy if you partitoin correctly, and it would keep the
number of partitions in a manageable range.

> If partitioned by timestamp, how are lookups by ID performed? Is that
> a sequential scan on all partitions, e.g. with using available indexes
> per partition?

They will scan all partitions. You have to be prepared that most queries
will become at least slightly slower with partitioning. That is
expected.

> Is there some way to RANGE by timestamp and ID, by keeping the
> half-year tables only? So that the planner knows easier which
> partition to look at in case of IDs only? Or is that simply like
> querying an ID-index of each partition?

Sure, you can partition by ID (ranges; or perhaps hash partitioning,
if you only ever query for a single ID) and subpartition by timestamp.

This will speed up the queries by ID, since only a single partition
will be searched. You could then almost be as fast as with a single
monolitic table. Don't forget that the speed of an index scan does
not (or almost not) depend on the size of the table.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Yambu 2021-02-10 17:21:17 Connecting to database through username,passphrase and private key on PgAdmin
Previous Message Thorsten Schöning 2021-02-10 15:09:30 Which partition scheme makes sense for my time based IoT-datagrams?