HomePython softwarell.sisyphus

ll.sisyphus

Writing cron jobs with Python

ll.sisyphus simplifies running Python stuff as cron jobs.

There will be no more than one sisyphus job of a certain name running at any given time. A job has a maximum allowed runtime. If this maximum is exceeded, the job will kill itself. In addition to that, job execution can be logged and in case of job failure an email can be sent.

To use this module, you must derive your own class from Job and implement the execute method.

Logs will (by default) be created in the ~/ll.sisyphus directory. This can be changed by deriving a new subclass and overwriting the appropriate class attribute.

To execute a job, use the module level function execute (or executewithargs when you want to support command line arguments).

Example

The following example illustrates the use of this module:

#!/usr/bin/env python

import os
import urllib.request
from ll import sisyphus

class Fetch(sisyphus.Job):
   projectname = "ACME.FooBar"
   jobname = "Fetch"
   argdescription = "fetch http://www.python.org/ and save it to a local file"
   maxtime = 3 * 60

   def __init__(self):
      self.url = "http://www.python.org/"
      self.tmpname = "Fetch_Tmp_{}.html".format(os.getpid())
      self.officialname = "Python.html"

   def execute(self):
      self.log("fetching data from {!r}".format(self.url))
      data = urllib.request.urlopen(self.url).read()
      datasize = len(data)
      self.log("writing file {!r} ({:,} bytes)".format(self.tmpname, datasize))
      open(self.tmpname, "wb").write(data)
      self.log("renaming file {!r} to {!r}".format(self.tmpname, self.officialname))
      os.rename(self.tmpname, self.officialname)
      return "cached {!r} as {!r} ({:,} bytes)".format(self.url, self.officialname, datasize)

if __name__=="__main__":
   sisyphus.executewithargs(Fetch())

You will find the log files for this job in ~/ll.sisyphus/ACME.FooBar/Fetch/.

Logging and tags

Logging itself is done by calling self.log:

self.log("can't parse XML file {}".format(filename))

This logs the argument without tagging the line.

It is possible to add tags to the logging call. This is done by accessing attributes of the log pseudo method. I.e. to add the tags xml and warning to a log call you can do the following:

self.log.xml.warning("can't parse XML file {}".format(filename))

It's also possible to do this via __getitem__ calls, i.e. the above can be written like this:

self.log['xml']['warning']("can't parse XML file {}".format(filename))

ll.sisyphus itself uses the following tags:

sisyphus

This tag will be added to all log lines produced by ll.sisyphus itself.

init

This tag is used for the log lines output at the start of the job.

report

This tag will be added for all log messages related to sending the failure report email.

result

This tag is used for the final line written to the log files that shows a summary of what the job did (or why it failed).

fail

This tag is used in the result line if the job failed with an exception.

errors

This tag is used in the result line if the job ran to completion, but some exceptions where logged.

ok

This tag is used in the result line if the job ran to completion without any exceptions.

kill

This tag is used in the result line if the job was killed because it exceeded the maximum allowed runtime.

Exceptions

When an exception object is passed to self.log the tag exc will be added to the log call automatically.

Email

It is possible to send an email when a job fails. For this the options --fromemail, --toemail and --smtphost have to be set. If the job terminates because of an exception, or exceeds its maximum runtime (and the option --noisykills is set) or any of the calls to self.log include the tag email, the email will be sent. This email includes all logging calls and the final exception (if there is any) in plain text and HTML format as well as as a JSON attachment.

def _formattraceback​(exc):

def _formatlines​(obj):

class Job​(object):

A Job object executes a task once.

To use this class, derive your own class from it and overwrite the execute method.

The job can be configured in three ways: By class attributes in the Job subclass, by attributes of the Job instance (e.g. set in __init__) and by command line arguments (if executewithargs is used). The following attributes/arguments are supported:

projectname (-p or --projectname)

The name of the project this job belongs to. This might be a dot-separated hierarchical project name (e.g. including customer names or similar stuff).

jobname (-j or --jobname)

The name of the job itself (defaulting to the name of the class if none is given).

identifier (--identifier)

An additional identifier that will be added to the failure report email.

argdescription (No command line equivalent)

Description for help message of the command line argument parser.

fromemail (--fromemail)

The sender email address for the failure report email.

This email will only be sent if the options --fromemail, --toemail and --smtphost are set (and any error or output to the email log occured).

toemail (--toemail)

An email address where an email will be sent in case of a failure.

smtphost (--smtphost)

The SMTP server to be used for sending the failure report email.

smtpport (--smtpport)

The port number used for the connection to the SMTP server.

smtpuser (--smtpuser)

The user name used to log into the SMTP server. (Login will only be done if both --smtpuser and --smtppassword are given)

smtppassword (--smtppassword)

The password used to log into the SMTP server.

maxtime (-m or --maxtime)

Maximum allowed runtime for the job (as the number of seconds). If the job runs longer than that it will kill itself.

fork (--fork)

Forks the process and does the work in the child process. The parent process is responsible for monitoring the maximum runtime (this is the default). In non-forking mode the single process does both the work and the runtime monitoring.

noisykills (--noisykills)

Should a message be printed/a failure email be sent when the maximum runtime is exceeded?

notify (-n or --notify)

Should a notification be issued to the OS X Notification center? (done via terminal-notifier).

logfilename (--logfilename)

Path/name of the logfile for this job as an UL4 template. Variables available in the template include user_name, projectname, jobname and starttime.

loglinkname (--loglinkname)

The filename of a link that points to the currently active logfile (as an UL4 template). If this is None no link will be created.

log2file (-f or --log2file)

Should a logfile be written at all?

formatlogline (--formatlogline)

An UL4 template for formatting each line in the logfile. Available variables are time (current time), starttime (start time of the job), tags (list of tags for the line) and line (the log line itself).

keepfilelogs (--keepfilelogs)

The number of days the logfiles are kept. Old logfiles (i.e. all files in the same directory as the current logfile that are more than keepfilelogs days old) will be removed at the end of the job.

compressfilelogs (--compressfilelogs)

The number of days after which log files are compressed (if they aren't deleted via keepfilelogs).

compressmode (--compressmode)

How to compress the logfiles. Possible values are: "gzip", "bzip2" and "lzma". The default is "bzip2".

encoding (--encoding)

The encoding to be used for the logfile. The default is "utf-8".

errors (--errors)

Encoding error handler name (goes with encoding). The default is "strict".

maxemailerrors (--maxemailerrors)

This options limits the number of exceptions and errors messages that will get attached to the failure email. The default is 10.

proctitle (--proctitle)

When this options is specified, the process title will be modified during execution of the job, so that the ps command shows what the processes are doing. (This requires setproctitle.)

Command line arguments take precedence over instance attributes (if executewithargs is used) and those take precedence over class attributes.

def execute​(self):

Execute the job once. The return value is a one line summary of what the job did. Overwrite in subclasses.

def failed​(self):

Called when running the job generated an exception. Overwrite in subclasses, to e.g. rollback your database transactions.

def argparser​(self):

Return an argparse parser for parsing the command line arguments. This can be overwritten in subclasses to add more arguments.

def parseargs​(self, args=None):

Use the parser returned by argparser to parse the argument sequence args, modify self accordingly and return the result of the parsers parse_args call.

def getmaxtime​(self):

def getmaxtime_seconds​(self):

def _alarm_fork​(self, signum, frame):

def _alarm_nofork​(self, signum, frame):

def _handleexecution​(self):

Handle executing the job including handling of duplicate or hanging jobs.

def notifystart​(self):

def notifyfinish​(self, result):

def task​(self, type=None, name=None, index=None, count=None):

task is a context manager and can be used to specify subtasks.

Arguments have the following meaning:

type (string or None)

The type of the task.

name (string or None)

The name of the task.

index (integer or None)

If this task is one in a sequence of similar tasks, index should be the index of this task, i.e. the first task of this type has index==0, the second one index==1 etc.

count (integer or None)

If this task is one in a sequence of similar tasks and the total number of tasks is known, count should be the total number of tasks.

def tasks​(self, iterable, type=None, name=None):

tasks iterates through iterable and calls task for each item. index and count will be passed to task automatically. type and name will be used for the type and name of the task. They can either be constants (in which case they will be passed as is) or callables (in which case they will be called with the item to get the type/name).

Example:

import sys, operator

items = sys.modules.items()
for (name, module) in self.tasks(items, "module", operator.itemgetter(0)):
   self.log("module is {}".format(module))

The log output will look something like the following:

[2014-11-14 11:17:01.319291]=[t+0:00:00.342013] :: {sisyphus}{init} >> /Users/walter/test.py (max time 0:05:00; pid 33482)
[2014-11-14 11:17:01.321860]=[t+0:00:00.344582] :: {sisyphus}{init} >> forked worker child (child pid 33485)
[2014-11-14 11:17:01.324067]=[t+0:00:00.346789] :: module tokenize (1/212) :: {email} >> module is <module 'tokenize' from '/Users/walter/.local/lib/python3.4/tokenize.py'>
[2014-11-14 11:17:01.327711]=[t+0:00:03.350433] :: module heapq (2/212) :: {email} >> module is <module 'heapq' from '/Users/walter/.local/lib/python3.4/heapq.py'>
[2014-11-14 11:17:01.333471]=[t+0:00:09.356193] :: module marshal (3/212) :: {email} >> module is <module 'marshal' (built-in)>
[2014-11-14 11:17:01.340733]=[t+0:00:15.363455] :: module math (4/212) :: {email} >> module is <module 'math' from '/Users/walter/.local/lib/python3.4/lib-dynload/math.so'>
[2014-11-14 11:17:01.354177]=[t+0:00:18.366899] :: module urllib.parse (5/212) :: {email} >> module is <module 'urllib.parse' from '/Users/walter/.local/lib/python3.4/urllib/parse.py'>
[2014-11-14 11:17:01.368187]=[t+0:00:21.370909] :: module _posixsubprocess (6/212) :: {email} >> module is <module '_posixsubprocess' from '/Users/walter/.local/lib/python3.4/lib-dynload/_posixsubprocess.so'>
[2014-11-14 11:17:01.372633]=[t+0:00:33.385355] :: module pickle (7/212) :: {email} >> module is <module 'pickle' from '/Users/walter/.local/lib/python3.4/pickle.py'>
[...]
[2014-11-14 11:17:03.768065]=[t+0:00:39.790787] :: {sisyphus}{info} >> Compressing logfiles older than 7 days, 0:00:00 via bzip2
[2014-11-14 11:17:03.768588]=[t+0:00:39.791310] :: {sisyphus}{info} >> Compressing logfile /Users/walter/ll.sisyphus/ACME.FooBar/Test/2014-11-06-16-44-22-416878.sisyphuslog
[2014-11-14 11:17:03.772348]=[t+0:00:39.795070] :: {sisyphus}{info} >> Compressing logfile /Users/walter/ll.sisyphus/ACME.FooBar/Test/2014-11-06-16-44-37-839632.sisyphuslog
[2014-11-14 11:17:03.774178]=[t+0:00:39.796900] :: {sisyphus}{info} >> Cleanup done

def makeproctitle​(self, process, detail=None):

def setproctitle​(self, process, detail=None):

def _log​(self, tags, obj):

Log obj to the log file using tags as the list of tags.

def _getscriptsource​(self):

Reads the source code of the script into self.source.

def _getcrontab​(self):

Reads the current crontab into self.crontab.

def _createlog​(self):

Create the logfile and the link to the logfile (if requested).

class Task​(object):

A subtask of a Job.

def __init__​(self, job, type=None, name=None, index=None, count=None):

Create a Task object. For the meaning of the parameters see Job.task.

def __enter__​(self):

def __exit__​(self, type, value, traceback):

def __str__​(self):

def asjson​(self):

def __repr__​(self):

class Tag​(object):

A Tag object can be used to call a function with an additional list of tags. Tags can be added via __getattr__ or __getitem__ calls.

def __init__​(self, func, *tags):

def __getattr__​(self, tag):

def __getitem__​(self, tag):

def __call__​(self, *args, **kwargs):

class Logger​(object):

def log​(self, timestamp, tags, tasks, text):

def taskstart​(self, tasks):

def taskend​(self, tasks):

def close​(self):

class StreamLogger​(Logger):

def __init__​(self, job, stream, linetemplate):

def log​(self, timestamp, tags, tasks, text):

def __repr__​(self):

class URLResourceLogger​(StreamLogger):

def __init__​(self, job, resource, skipurls, linetemplate):

def close​(self):

def remove​(self, fileurl):

def compress​(self, fileurl, bufsize=65536):

class EmailLogger​(Logger):

def __init__​(self, job):

def log​(self, timestamp, tags, tasks, text):

def close​(self):

def execute​(job):

Execute the job job once.

def executewithargs​(job, args=None):

Execute the job job once with command line arguments.

args are the command line arguments (None results in sys.argv being used).

Alternate versions   Text   XIST   Python