``ll.sisyphus`` simplifies running Python stuff as cron jobs. There will be no more than one sisyphus job of a certain name running at any given time. A job has a maximum allowed runtime. If this maximum is exceeded, the job will kill itself. In addition to that, job execution can be logged and in case of job failure an email can be sent. To use this module, you must derive your own class from ``Job`` and implement the ``execute`` method. Logs will (by default) be created in the ``~/ll.sisyphus`` directory. This can be changed by deriving a new subclass and overwriting the appropriate class attribute. To execute a job, use the module level function ``execute`` (or ``executewithargs`` when you want to support command line arguments). ======= Example ======= The following example illustrates the use of this module: #!/usr/bin/env python import os import urllib.request from ll import sisyphus class Fetch(sisyphus.Job): projectname = "ACME.FooBar" jobname = "Fetch" argdescription = "fetch http://www.python.org/ and save it to a local file" maxtime = 3 * 60 def __init__(self): self.url = "http://www.python.org/" self.tmpname = "Fetch_Tmp_{}.html".format(os.getpid()) self.officialname = "Python.html" def execute(self): self.log("fetching data from {!r}".format(self.url)) data = urllib.request.urlopen(self.url).read() datasize = len(data) self.log("writing file {!r} ({:,} bytes)".format(self.tmpname, datasize)) open(self.tmpname, "wb").write(data) self.log("renaming file {!r} to {!r}".format(self.tmpname, self.officialname)) os.rename(self.tmpname, self.officialname) return "cached {!r} as {!r} ({:,} bytes)".format(self.url, self.officialname, datasize) if __name__=="__main__": sisyphus.executewithargs(Fetch()) You will find the log files for this job in ``~/ll.sisyphus/ACME.FooBar/Fetch/``. ================ Logging and tags ================ Logging itself is done by calling ``self.log``: self.log("can't parse XML file {}".format(filename)) This logs the argument without tagging the line. It is possible to add tags to the logging call. This is done by accessing attributes of the ``log`` pseudo method. I.e. to add the tags ``xml`` and ``warning`` to a log call you can do the following: self.log.xml.warning("can't parse XML file {}".format(filename)) It's also possible to do this via ``__getitem__`` calls, i.e. the above can be written like this: self.log['xml']['warning']("can't parse XML file {}".format(filename)) ``ll.sisyphus`` itself uses the following tags: ``sisyphus`` This tag will be added to all log lines produced by ``ll.sisyphus`` itself. ``init`` This tag is used for the log lines output at the start of the job. ``report`` This tag will be added for all log messages related to sending the failure report email. ``result`` This tag is used for the final line written to the log files that shows a summary of what the job did (or why it failed). ``fail`` This tag is used in the result line if the job failed with an exception. ``errors`` This tag is used in the result line if the job ran to completion, but some exceptions where logged. ``ok`` This tag is used in the result line if the job ran to completion without any exceptions. ``kill`` This tag is used in the result line if the job was killed because it exceeded the maximum allowed runtime. ========== Exceptions ========== When an exception object is passed to ``self.log`` the tag ``exc`` will be added to the log call automatically. ===== Email ===== It is possible to send an email when a job fails. For this the options ``--fromemail``, ``--toemail`` and ``--smtphost`` have to be set. If the job terminates because of an exception, or exceeds its maximum runtime (and the option ``--noisykills`` is set) or any of the calls to ``self.log`` include the tag ``email``, the email will be sent. This email includes all logging calls and the final exception (if there is any) in plain text and HTML format as well as as a JSON attachment. =================================== def ``_formattraceback``​(``exc``): =================================== =============================== def ``_formatlines``​(``obj``): =============================== =========================== class ``Job``​(``object``): =========================== A Job object executes a task once. To use this class, derive your own class from it and overwrite the ``execute`` method. The job can be configured in three ways: By class attributes in the ``Job`` subclass, by attributes of the ``Job`` instance (e.g. set in ``__init__``) and by command line arguments (if ``executewithargs`` is used). The following attributes/arguments are supported: ``projectname`` (``-p`` or ``--projectname``) The name of the project this job belongs to. This might be a dot-separated hierarchical project name (e.g. including customer names or similar stuff). ``jobname`` (``-j`` or ``--jobname``) The name of the job itself (defaulting to the name of the class if none is given). ``identifier`` (``--identifier``) An additional identifier that will be added to the failure report email. ``argdescription`` (No command line equivalent) Description for help message of the command line argument parser. ``fromemail`` (``--fromemail``) The sender email address for the failure report email. This email will only be sent if the options ``--fromemail``, ``--toemail`` and ``--smtphost`` are set (and any error or output to the email log occured). ``toemail`` (``--toemail``) An email address where an email will be sent in case of a failure. ``smtphost`` (``--smtphost``) The SMTP server to be used for sending the failure report email. ``smtpport`` (``--smtpport``) The port number used for the connection to the SMTP server. ``smtpuser`` (``--smtpuser``) The user name used to log into the SMTP server. (Login will only be done if both ``--smtpuser`` and ``--smtppassword`` are given) ``smtppassword`` (``--smtppassword``) The password used to log into the SMTP server. ``maxtime`` (``-m`` or ``--maxtime``) Maximum allowed runtime for the job (as the number of seconds). If the job runs longer than that it will kill itself. ``fork`` (``--fork``) Forks the process and does the work in the child process. The parent process is responsible for monitoring the maximum runtime (this is the default). In non-forking mode the single process does both the work and the runtime monitoring. ``noisykills`` (``--noisykills``) Should a message be printed/a failure email be sent when the maximum runtime is exceeded? ``notify`` (``-n`` or ``--notify``) Should a notification be issued to the OS X Notification center? (done via terminal-notifier). ``logfilename`` (``--logfilename``) Path/name of the logfile for this job as an UL4 template. Variables available in the template include ``user_name``, ``projectname``, ``jobname`` and ``starttime``. ``loglinkname`` (``--loglinkname``) The filename of a link that points to the currently active logfile (as an UL4 template). If this is ``None`` no link will be created. ``log2file`` (``-f`` or ``--log2file``) Should a logfile be written at all? ``formatlogline`` (``--formatlogline``) An UL4 template for formatting each line in the logfile. Available variables are ``time`` (current time), ``starttime`` (start time of the job), ``tags`` (list of tags for the line) and ``line`` (the log line itself). ``keepfilelogs`` (``--keepfilelogs``) The number of days the logfiles are kept. Old logfiles (i.e. all files in the same directory as the current logfile that are more than ``keepfilelogs`` days old) will be removed at the end of the job. ``compressfilelogs`` (``--compressfilelogs``) The number of days after which log files are compressed (if they aren't deleted via ``keepfilelogs``). ``compressmode`` (``--compressmode``) How to compress the logfiles. Possible values are: ``"gzip"``, ``"bzip2"`` and ``"lzma"``. The default is ``"bzip2"``. ``encoding`` (``--encoding``) The encoding to be used for the logfile. The default is ``"utf-8"``. ``errors`` (``--errors``) Encoding error handler name (goes with ``encoding``). The default is ``"strict"``. ``maxemailerrors`` (``--maxemailerrors``) This options limits the number of exceptions and errors messages that will get attached to the failure email. The default is 10. ``proctitle`` (``--proctitle``) When this options is specified, the process title will be modified during execution of the job, so that the ``ps`` command shows what the processes are doing. (This requires ``setproctitle``.) Command line arguments take precedence over instance attributes (if ``executewithargs`` is used) and those take precedence over class attributes. def ``execute``​(``self``): --------------------------- Execute the job once. The return value is a one line summary of what the job did. Overwrite in subclasses. def ``failed``​(``self``): -------------------------- Called when running the job generated an exception. Overwrite in subclasses, to e.g. rollback your database transactions. def ``argparser``​(``self``): ----------------------------- Return an ``argparse`` parser for parsing the command line arguments. This can be overwritten in subclasses to add more arguments. def ``parseargs``​(``self``, ``args``=``None``): ------------------------------------------------ Use the parser returned by ``argparser`` to parse the argument sequence ``args``, modify ``self`` accordingly and return the result of the parsers ``parse_args`` call. def ``getmaxtime``​(``self``): ------------------------------ def ``getmaxtime_seconds``​(``self``): -------------------------------------- def ``_alarm_fork``​(``self``, ``signum``, ``frame``): ------------------------------------------------------ def ``_alarm_nofork``​(``self``, ``signum``, ``frame``): -------------------------------------------------------- def ``_handleexecution``​(``self``): ------------------------------------ Handle executing the job including handling of duplicate or hanging jobs. def ``notifystart``​(``self``): ------------------------------- def ``notifyfinish``​(``self``, ``result``): -------------------------------------------- def ``task``​(``self``, ``type``=``None``, ``name``=``None``, ``index``=``None``, ``count``=``None``): ------------------------------------------------------------------------------------------------------ ``task`` is a context manager and can be used to specify subtasks. Arguments have the following meaning: ``type`` (string or ``None``) The type of the task. ``name`` (string or ``None``) The name of the task. ``index`` (integer or ``None``) If this task is one in a sequence of similar tasks, ``index`` should be the index of this task, i.e. the first task of this type has ``index==0``, the second one ``index==1`` etc. ``count`` (integer or ``None``) If this task is one in a sequence of similar tasks and the total number of tasks is known, ``count`` should be the total number of tasks. def ``tasks``​(``self``, ``iterable``, ``type``=``None``, ``name``=``None``): ----------------------------------------------------------------------------- ``tasks`` iterates through ``iterable`` and calls ``task`` for each item. ``index`` and ``count`` will be passed to ``task`` automatically. ``type`` and ``name`` will be used for the type and name of the task. They can either be constants (in which case they will be passed as is) or callables (in which case they will be called with the item to get the type/name). Example: import sys, operator items = sys.modules.items() for (name, module) in self.tasks(items, "module", operator.itemgetter(0)): self.log("module is {}".format(module)) The log output will look something like the following: [2014-11-14 11:17:01.319291]=[t+0:00:00.342013] :: {sisyphus}{init} >> /Users/walter/test.py (max time 0:05:00; pid 33482) [2014-11-14 11:17:01.321860]=[t+0:00:00.344582] :: {sisyphus}{init} >> forked worker child (child pid 33485) [2014-11-14 11:17:01.324067]=[t+0:00:00.346789] :: module tokenize (1/212) :: {email} >> module is [2014-11-14 11:17:01.327711]=[t+0:00:03.350433] :: module heapq (2/212) :: {email} >> module is [2014-11-14 11:17:01.333471]=[t+0:00:09.356193] :: module marshal (3/212) :: {email} >> module is [2014-11-14 11:17:01.340733]=[t+0:00:15.363455] :: module math (4/212) :: {email} >> module is [2014-11-14 11:17:01.354177]=[t+0:00:18.366899] :: module urllib.parse (5/212) :: {email} >> module is [2014-11-14 11:17:01.368187]=[t+0:00:21.370909] :: module _posixsubprocess (6/212) :: {email} >> module is [2014-11-14 11:17:01.372633]=[t+0:00:33.385355] :: module pickle (7/212) :: {email} >> module is [...] [2014-11-14 11:17:03.768065]=[t+0:00:39.790787] :: {sisyphus}{info} >> Compressing logfiles older than 7 days, 0:00:00 via bzip2 [2014-11-14 11:17:03.768588]=[t+0:00:39.791310] :: {sisyphus}{info} >> Compressing logfile /Users/walter/ll.sisyphus/ACME.FooBar/Test/2014-11-06-16-44-22-416878.sisyphuslog [2014-11-14 11:17:03.772348]=[t+0:00:39.795070] :: {sisyphus}{info} >> Compressing logfile /Users/walter/ll.sisyphus/ACME.FooBar/Test/2014-11-06-16-44-37-839632.sisyphuslog [2014-11-14 11:17:03.774178]=[t+0:00:39.796900] :: {sisyphus}{info} >> Cleanup done def ``makeproctitle``​(``self``, ``process``, ``detail``=``None``): ------------------------------------------------------------------- def ``setproctitle``​(``self``, ``process``, ``detail``=``None``): ------------------------------------------------------------------ def ``_log``​(``self``, ``tags``, ``obj``): ------------------------------------------- Log ``obj`` to the log file using ``tags`` as the list of tags. def ``_getscriptsource``​(``self``): ------------------------------------ Reads the source code of the script into ``self.source``. def ``_getcrontab``​(``self``): ------------------------------- Reads the current crontab into ``self.crontab``. def ``_createlog``​(``self``): ------------------------------ Create the logfile and the link to the logfile (if requested). ============================ class ``Task``​(``object``): ============================ A subtask of a ``Job``. def ``__init__``​(``self``, ``job``, ``type``=``None``, ``name``=``None``, ``index``=``None``, ``count``=``None``): ------------------------------------------------------------------------------------------------------------------- Create a ``Task`` object. For the meaning of the parameters see ``Job.task``. def ``__enter__``​(``self``): ----------------------------- def ``__exit__``​(``self``, ``type``, ``value``, ``traceback``): ---------------------------------------------------------------- def ``__str__``​(``self``): --------------------------- def ``asjson``​(``self``): -------------------------- def ``__repr__``​(``self``): ---------------------------- =========================== class ``Tag``​(``object``): =========================== A ``Tag`` object can be used to call a function with an additional list of tags. Tags can be added via ``__getattr__`` or ``__getitem__`` calls. def ``__init__``​(``self``, ``func``, *``tags``): ------------------------------------------------- def ``__getattr__``​(``self``, ``tag``): ---------------------------------------- def ``__getitem__``​(``self``, ``tag``): ---------------------------------------- def ``__call__``​(``self``, *``args``, **``kwargs``): ----------------------------------------------------- ============================== class ``Logger``​(``object``): ============================== def ``log``​(``self``, ``timestamp``, ``tags``, ``tasks``, ``text``): --------------------------------------------------------------------- def ``taskstart``​(``self``, ``tasks``): ---------------------------------------- def ``taskend``​(``self``, ``tasks``): -------------------------------------- def ``close``​(``self``): ------------------------- ==================================== class ``StreamLogger``​(``Logger``): ==================================== def ``__init__``​(``self``, ``job``, ``stream``, ``linetemplate``): ------------------------------------------------------------------- def ``log``​(``self``, ``timestamp``, ``tags``, ``tasks``, ``text``): --------------------------------------------------------------------- def ``__repr__``​(``self``): ---------------------------- =============================================== class ``URLResourceLogger``​(``StreamLogger``): =============================================== def ``__init__``​(``self``, ``job``, ``resource``, ``skipurls``, ``linetemplate``): ----------------------------------------------------------------------------------- def ``close``​(``self``): ------------------------- def ``remove``​(``self``, ``fileurl``): --------------------------------------- def ``compress``​(``self``, ``fileurl``, ``bufsize``=``65536``): ---------------------------------------------------------------- =================================== class ``EmailLogger``​(``Logger``): =================================== def ``__init__``​(``self``, ``job``): ------------------------------------- def ``log``​(``self``, ``timestamp``, ``tags``, ``tasks``, ``text``): --------------------------------------------------------------------- def ``close``​(``self``): ------------------------- ========================== def ``execute``​(``job``): ========================== Execute the job ``job`` once. ===================================================== def ``executewithargs``​(``job``, ``args``=``None``): ===================================================== Execute the job ``job`` once with command line arguments. ``args`` are the command line arguments (``None`` results in ``sys.argv`` being used).