Re: Hot to model data in DB (PostgreSQL) for SNMP-like multiple configurations

From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: Alexandru Lazarev <alexandru(dot)lazarev(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hot to model data in DB (PostgreSQL) for SNMP-like multiple configurations
Date: 2019-03-08 15:15:35
Message-ID: CAF-3MvM=ZHs5XLOyF=YXf0Pv1qADyRj=u2Y_yN6JNRU6HhuHNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

Is there a reason not to use a relational model instead of json(b) here? I
think that is in fact considered best practice.

On Fri, 8 Mar 2019 at 15:40, Alexandru Lazarev <alexandru(dot)lazarev(at)gmail(dot)com>
wrote:

> I am working on product managing and monitoring Network (NMS-like
> products).
>
> Product manages configuration of network devices, for now each device has
> stored its configuration in simple table - this was the original design.
>
> CREATE TABLE public.configuration(
> id integer NOT NULL,
> config json NOT NULL,
> CONSTRAINT configuration_pkey PRIMARY KEY (id),)
>
> A config looks like:
>
> {
> "_id": 20818132,
> "type": "Modem",
> "data": [{
> "oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.40",
> "instance": "24",
> "value": "null"
> },
> {
> "oid": "1.3.6.1.4.1.9999.3.5.10.1.86",
> "instance": "0",
> "value": "562"
> },
> {
> "oid": "1.3.6.1.4.1.9999.3.5.10.3.92.4.1",
> "instance": "0",
> "value": "0"
> },
> {
> "oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43",
> "instance": "24",
> "value": "vlan24"
> },
> {
> "oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43",
> "instance": "25",
> "value": "vlan25"
> }
> ]}
>
> And there are many plv8 (java script procedural language extension for
> PostgreSQL) stored procedures working on bulks of such config, reading some
> OIDs, changing them conditionally, removing some of them and adding others,
> especially in use cases like: There are some upper-level META-configuration
> of different level, which during change have to update all their updated
> parameters to all affected leaves configs. An simple test-example (but
> without touching 'data' node)
>
> CREATE OR REPLACE FUNCTION public.process_jsonb_plv8()
> RETURNS void AS$BODY$
> var CFG_TABLE_NAME = "configurations";
> var selPlan = plv8.prepare( "select c.config from " + CFG_TABLE_NAME + " c where c.id = $1", ['int'] );
> var updPlan = plv8.prepare( 'update ' + CFG_TABLE_NAME + ' set config = $1 where id = $2', ['jsonb','int'] );
>
> try {
>
> var ids = plv8.execute('select id from devices');
>
> for (var i = 0; i < ids.length; i++) {
> var db_cfg = selPlan.execute(ids[i].id); //Get current json config from DB
> var cfg = db_cfg[0].config;
> cfg["key0"] = 'plv8_json'; //-add some dummy key
> updPlan.execute(cfg, ids[i].id); //put uopdated JSON config in DB
> plv8.elog(NOTICE, "UPDATED = " + ids[i].id);
>
>
> }} finally {
> selPlan.free();
> updPlan.free();}
> return;$BODY$
> LANGUAGE plv8 VOLATILE
> COST 100;
>
> For real use-cases plv8 SPs are more complicated, doing FOR-LOOP through
> ALL OIDs object of 'data' array, checking if it is looking for and update
> value an/or remove it and/or add newer if necessary.
>
> Since number of devices in DB increased from several hundreds to 40K or
> even 70K, and number of OID+Instance combinations also increased from
> several hundred to ~1K and sometimes up to 10K within a config, we start
> facing slowness in bulk (especially global -> update to ALL Devices)
> updates/searches.
>
> In order to get rid off FOR LOOP step for each configuration I've
> converted data-node from array to object (key-value model), something like
> :
>
> {
> "_id": 20818132,
> "type": "Modem",
> "data": {
> "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.40": {
> "24": "null"
> },
> "1.3.6.1.4.1.9999.3.5.10.1.86": {
> "0": "562"
> },
> "1.3.6.1.4.1.9999.3.5.10.3.92.4.1": {
> "0": "0"
> },
> "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43": {
> "24": "vlan24",
> "25": "vlan25"
> }
> }}
>
> Now in order to get a concrete OID (e.g.
> "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43") and/or its instance I do 1-2 *O(1)*
> operations instead *O(n)*. And it become a bit faster. After I've changed
> column type from json to jsonb - I've got a lot of memory issues with
> plv8 stored procedures, so now ideas is:
>
> *What are the best practices to store such data and use cases in DB?*
> taking in considerations following: - Bulk and global updates are often
> enough (user-done operation) - several times per week and it takes long
> time - several minutes, annoying user experience. - Consulting some OIDs
> only from concrete config is medium frequency use case - Consulting ALL
> devices have some specific OID (SNMP Parameter) settled to a specific value
> - medium frequency cases. - Consult (read) a configuration for a specific
> device as a whole document - often use case (it is send to device as json
> or as converted CSV, it is send in modified json format to other utilities,
> etc)
>
> One of suggestion from other oppinions is to move ALL configurations to
> simple plain relational table
>
> CREATE TABLE public.configuration_plain(
> device_id integer,
> oid text,
> instance text,
> value text)
>
> Looking like
>
> *id*
>
> *oid*
>
> *instance*
>
> *value*
>
> 20818132
>
> 1.3.6.1.4.1.9999.2.13
>
> 0
>
> VSAT
>
> 20818132
>
> 1.3.6.1.4.1.9999.3.10.2.2.10.15
>
> 0
>
> 0
>
> 20818132
>
> 1.3.6.1.4.1.9999.3.10.2.2.10.17
>
> 0
>
> 0
>
> 20818132
>
> 1.3.6.1.4.1.9999.3.10.2.2.10.18
>
> 0
>
> 1
>
> 20818132
>
> 1.3.6.1.4.1.9999.3.10.2.2.10.19
>
> 0
>
> 2
>
> 20818132
>
> 1.3.6.1.4.1.9999.3.10.2.2.10.8.1.1
>
> 24
>
> 24
>
> 20818132
>
> 1.3.6.1.4.1.9999.3.10.2.2.10.8.1.1
>
> 25
>
> 25
>
> 20818132
>
> 1.3.6.1.4.1.9999.3.10.2.2.10.8.1.2
>
> 24
>
> vlan24
>
> 20818132
>
> 1.3.6.1.4.1.9999.3.10.2.2.10.8.1.2
>
> 25
>
> VLAN_25
>
> And now I end with a table of ~33 M rows for 40K devices * (700-900
> OID+Instance combinations). Some simple selects and updates (especially if
> I add simple indexes on id, oid columns) works faster than JSON (less than
> 1 sec updating one OID for ALL devices), but on some stored procedures
> where I need to do some checks and business logic before manipulating
> concrete parameters in configuration - performance decrease again from 10
> to 25 seconds in below example with each nee added operation:
>
> CREATE OR REPLACE FUNCTION public.test_update_bulk_configuration_plain_plpg(
> sql_condition text, -- something like 'select id from devices'
> new_elements text, --collection of OIDs to be Added or Update, could be JSON Array or comma separated list, containing 1 or more (100) OIDs
> oids_to_delete text --collection of OIDs to Delete
> )
> RETURNS void AS$BODY$DECLARE
> r integer;
> cnt integer;
> ids int[];
> lid int;BEGIN
> RAISE NOTICE 'start';
> EXECUTE 'SELECT ARRAY(' || sql_condition || ')' into ids;
> FOREACH lid IN ARRAY ids
> LOOP
> -- DELETE
> -- Some business logic
> -- FOR .. IF .. BEGIN
> delete from configuration_plain c where c.oid = '1.3.6.1.4.1.9999.3.5.10.3.201.1.1' and instance = '10' and c.device_id = lid;
> delete from configuration_plain c where c.oid = 'Other OID' and instance = 'Index' and c.device_id = lid;
> -- other eventual deletes
> --END
>
> -- UPDATE
> -- Some business logic
> -- FOR .. IF .. BEGIN
> update configuration_plain c set value = '2' where c.oid = '1.3.6.1.4.1.9999.3.5.10.3.87' and c.device_id = lid;
> update configuration_plain c set value = '2' where c.oid = '1.3.6.1.4.1.9999.3.5.10.3.201.1.1' and instance = '1' and c.device_id = lid;
> -- other eventual updates
> -- END
>
> --INSERT
> insert into configuration_plain (id, oid, instance, value) values (lid,'1.3.6.1.4.1.9999.3.5.10.3.201.1.1', '11', '11');
> -- OTHER eventually....
> insert into configuration_plain (id, oid, instance, value) values (lid,'OTHER_OID', 'Idx', 'Value of OID');
> END LOOP;
> RAISE NOTICE 'end';
> RETURN;END$BODY$
> LANGUAGE plpgsql VOLATILE
> COST 100;
>
> So any best practices and advice on such data and use cases modeling in DB?
>
> Regards,
>
> AlexL
>

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Amit Langote 2019-03-08 16:13:17 Re: Update does not move row across foreign partitions in v11
Previous Message Alvaro Herrera 2019-03-08 15:03:18 Re: Update does not move row across foreign partitions in v11

Browse pgsql-performance by date

  From Date Subject
Next Message Alexandru Lazarev 2019-03-08 16:40:27 Re: Hot to model data in DB (PostgreSQL) for SNMP-like multiple configurations
Previous Message Tom Lane 2019-03-08 14:53:23 Re: IS NOT DISTINCT FROM statement