Re: proposal: contrib module - generic command scheduler

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: hubert depesz lubaczewski <depesz(at)depesz(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: contrib module - generic command scheduler
Date: 2015-05-13 04:32:47
Message-ID: CAFj8pRDumbF8RyR=Rb4Px4PeML-fvYprb43sXGi=hqkixOVqqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2015-05-13 4:08 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

>
>
> On 13 May 2015 at 00:31, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>
>>
>>
>> 2015-05-12 11:27 GMT+02:00 hubert depesz lubaczewski <depesz(at)depesz(dot)com>:
>>
>>> On Tue, May 12, 2015 at 09:25:50AM +0200, Pavel Stehule wrote:
>>> > create type scheduled_time as (second int[], minute int[], hour int[],
>>> dow
>>> > int[], month int[]);
>>> > (,"{1,10,20,30,40,50}",,,) .. run every 10 minutes.
>>> > (,"{5}",,,) .. run once per hour
>>> > Comments, notices?
>>>
>>> First, please note that I'm definitely not a hacker, just a user.
>>>
>>> One comment that I'd like to make, is that since we're at planning
>>> phase, I think it would be great to add capability to limit number of
>>> executions of given command.
>>> This would allow running things like "at" in unix - run once, at given
>>> time, and that's it.
>>>
>>
>> I would not to store state on this level - so "at" should be implemented
>> on higher level. There is very high number of possible strategies, what can
>> be done with failed tasks - and I would not to open this topic. I believe
>> with proposed scheduler, anybody can simply implement what need in PLpgSQL
>> with dynamic SQL. But on second hand "run once" can be implemented with
>> proposed API too.
>>
>
> That seems reasonable in a v1, so long as there's room to easily extend it
> without pain to add "at"-like one-shot commands, at-startup commands, etc.
>
> I'd prefer to see a scheduling interface that's a close match for cron's
> or that leaves room for it - so things like "*/5" for every five minutes,
> ranges like "Mon-Fri", etc. If there's a way to express similar
> capabilities more cleanly using PostgreSQL's types and conventions that
> makes sense, but I'm not sure a composite type of arrays fits that.
>

I though about it too - but the parser for this cron time will be longer
than all other code probably. I see a possibility to write constructors
that simplify creating a value of this type. Some like

make_scheduled_time(secs => '*/5', dows => 'Mon-Fri') or
make_scheduled_time(at =>'2015-014-05 10:00:0'::timestamp);

There are two possible ways - composite with arrays or custom composite.
I'll decide later.

There are basic points:

1. don't hold a states, results of commands
2. It execute task immediately in related time window once (from start to
next start), when necessary worker is available
3. When command fails, it writes info to log only
4. When command runs too long (over specified timeout), it is killed.
5. When command waits to free worker, write to log
6. When command was not be executed due missing workers (and max_workers >
0), write to log

> How do you plan to manage the bgworkers?
>

I am thinking about one static supervisor, that will hold a calendar in
shared memory, that will start dynamic bgworkers for commands per database.
The scheduler is enabled in all databases, where the proposed extension is
installed.

For working with prototype I am planning to use SPI, but maybe it is not
necessary - so commands like VACUUM, CREATE DATABASE, DROP DATABASE can be
supported too. But I didn't tested it and I don't know if it is possible or
not. It can define new hooks too. So some other extensions can be based on
it.

>
>
> In BDR, where we have a similar need to have workers across multiple
> databases, and where each database contains a list of workers to launch, we
> have:
>
> * A single static "supervisor" bgworker. In 9.5 this will connect with
> InvalidOid as the target database so it can only access shared catalogs. In
> 9.4 this isn't possible in the bgworker API so we have to connect to a
> dummy database.
>
> * A dynamic background worker for each database in which BDR is enabled,
> which is launched from the supervisor. We check which DBs are BDR-enabled
> by (ab)using database security labels and checking pg_shseclabel from the
> supervisor worker so we only launch bgworkers on BDR-enabled DBs.
>
> * A dynamic background worker for each peer node, launched by the
> per-database worker based on the contents of that database's
> bdr.bdr_connections table.
>
>
> What I suspect you're going to want is:
>
> * A static worker launched by your extension when it starts, which
> launches per-db workers for each DB in which the scheduler is enabled. You
> could use a GUC listing scheduler-enabled DBs in postgresql.conf and have
> an on-reload hook to update it, you don't need to do the security label
> hack.
>
> * A DB scheduler worker, which looks up the scheduled tasks list, finds
> the next scheduled event, and sleeps on a long latch timeout until then,
> resetting it when interrupted. When it reaches the scheduled event it would
> launch a one-shot BGW_NO_RESTART worker to run the desired PL/PgSQL
> procedure over the SPI.
>
> * A task runner worker, which gets launched by the db scheduler to
> actually run a task using the SPI.
>
>
> Does that match your current thinking?
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-05-13 05:50:40 Re: proposal: contrib module - generic command scheduler
Previous Message Alvaro Herrera 2015-05-13 04:10:36 Re: Sequence Access Method WIP