591 Rate this article:
No rating

When Might I Use An IDL Task? IDL As a Key to Data Analysis in a Heterogeneous Computing Environment

Jim Pendleton

In IDL 8.6 we've exposed a new feature that standardizes a way for IDL to be called from any environment which can communicate between processes on a single operating system via standard input, standard output, and standard error. 

In our Harris Geospatial Custom Solutions Group, we look forward to deploying this new feature extensively to help our clients expose even more analysis capabilities into large, heterogeneous processing environments.

Although a programmer, in earlier IDL releases, could accomplish the goal of calling an IDL routine from an operating system-level script in an ad-hoc way using a combination of the IDL Runtime executable, the IDL function COMMAND_LINE_ARGS, and other techniques, the new IDL Task architecture adds a level of commonality and standardization.

In the past, you might have written individual IDL Runtime applications to execute atomic processes on data in this type of environment. Your architecture would package up arguments and make a call to the idlrt.exe with those arguments passed on the standard input command line, via a system(), fork(), or another language's equivalent to IDL's SPAWN, along with a path to the IDL SAVE file containing your "task" to execute.

With the IDL Task architecture, you write procedural wrappers to your functionality using standard IDL, in combination with a simple JSON file which defines the arguments for your task, their data types, transfer directions, etc.

Placing the compiled IDL task code along with the JSON in your IDL session's search path exposes the tasks to the IDL Task Engine. This is essentially a stateless application that wraps an IDL Runtime interpreter. It performs the essential bits of validating input and output arguments and packaging them up before calling your IDL routine.

Your job distribution system, such as Harris' Geospatial Framework, will call the IDL Task Engine with JSON that represents the name of the task to be executed along with the task's arguments, written to the task script's standard input.

The task engine starts an independent IDL interpreter process for each task, allowing multiple tasks to be executed in parallel, up to the number of available processing licenses.

The arguments to and from the IDL Task must use data types that can be represented in JSON.  That restriction precludes arguments that are disallowed from crossing process boundaries, such as references to objects or pointers, as defined either in IDL or in another language.

An Example - Generating a Summary Report From Multiple Images

Let's say through some mechanism outside IDL and ENVI you have generated a directory of image files. Perhaps you own a fleet of UAVs with sensors, or a satellite or two. These image files contain data that, when distilled through various processing algorithms, can produce a single intelligence report.

You want to fold this workflow into a larger processing sequence of services that consists of multiple steps, only one of which involves the generation of the reports.

For the IDL portion, let's say you already have an object class in IDL that takes as input the path to a directory of images, performs classifications and time series analysis and outputs a PDF report with a summary of the results. Because it's just that simple in IDL. 

Let's call this class IntelReportGenerator. We will look at this class first, outside the context of the IDL Task Engine. For simplicity, the class I will describe will only have two methods, an ::Init method and a ::GenerateReport method.

This class is super-efficient and only has a handful of member variables.

Pro IntelReportGenerator__Define
!null = {IntelReportGenerator,  $
    ImageDirectory : '', $ ; path to the images to be read, input
    OutputReport   : '', $ ; path to the report file to be written, input
    Author         : '', $ ; a name to be applied to the report, input
    Debug          : !false $ ; a flag to toggle more detailed debugging information
}
End

PSA: I highly recommend adding a debug flag to each class. Debugging might not be enabled in an operational environment, but it's always nice to know it can be turned on without a modification and redeployment of the code.

The ::Init method of the class is primarily used to populate the member variables with the keyword parameters.

Function IntelReportGenerator::Init, $
    Image_Directory = Image_Directory, $
    Author = Author, $
    Output_Report = Output_Report, $
    Debug = Debug, $
    Status = Status, $
    Error = Error
On_Error, 2
Status = !false ; Assume failure
Error = !null ; Clear any error string on input
Catch, ErrorNumber ; Handle any unexpected error conditions
If (ErrorNumber ne 0) then Begin
    Catch, /Cancel
    If (self.Debug) then Begin
        ; Return a complete traceback if debugging is enabled
        Help, /Last_Message, Output = Error
    EndIf Else Begin
        ; Return a summary error instead of a traceback
        Error = !error_state.msg
    EndElse
    Return, Status
EndIf
self.Debug = Keywod_Set(Debug)
self.Author = Author ne !null ? Author : 'UNKNOWN'
If (~File_Test(Image_Directory, /Dir)) then Message, 'Image directory does not exist.', /Traceback
self.ImageDirectory = Image_Directory
; ... More here.  you get the idea.
Status = !true
Return, 1
End

Next, let's consider the ::GenerateReport method. It's a simple matter of programming. We loop over the files in the input image directory, magic occurs, and an output file is generated. I relish the elegance of a simple design, don't you?

Pro IntelReportGenerator::GenerateReport, $
    Status = Status, $
    Error = Error
On_Error, 2
Status = !false
Error = !null
Catch, ErrorNumber
If (ErrorNumber ne 0) then Begin
    Catch, /Cancel
    If (self.Debug) then Begin
        Help, /Last_Message, Output = Error
    EndIf Else Begin
        Error = !error_state.msg
    EndElse
    Return
EndIf
Files = File_Search(self.ImageDirectory)
ForEach File, Files Do Begin
  ;... Magic analysis here.  Batteries not included.
EndFor
; Magic report-writing here.  Nope, still no batteries.
Status = !true
End

All this should look familiar to you thus far if you have written any IDL code, especially the magic bits.

In order to put this functionality into an IDL Task workflow, we will need to write a procedural wrapper for our class that will instantiate an object with the appropriate keywords, then execute the method to generate the report. We will name this new routine IntelReportTask.

Pro IntelReportTask, $
    Image_Directory = Image_Directory, $
    Author = Author, $
    Output_Report = Output_Report, $
    Debug = Debug, $
    Status = Status, $ ; an output from this procedure, 0 = failure, 1 = success
    Error = Error ; An error string if Status is 0, or null on return otherwise
On_Error, 2
Error = !null ; Clear any error string
Status = !false ; assume failure
; ALWAYS include a CATCH handler to manage unexpected
; exception conditions.
Catch, ErrorNumber
If (ErrorNumber ne 0) then Begin
    Catch, /Cancel
    If (self.Debug) then Begin
        ; Return a complete traceback if debugging is enabled
        Help, /Last_Message, Output = Error
    EndIf Else Begin
        ; Return only a summary error without traceback if debugging is off
        Error = !error_state.msg
    EndIf
    Return
EndIf
; Attempt to create the report-generation object, passing through the keywords.
o = IntelReportGenerator( $
    Image_Directory = Image_Directory, $
    Author = Author, $
    Output_Report = Output_Report)
    Status = Status, $
    Error = Error, $
    Debug = Debug)
If (Obj_Valid(o)) then Begin
    ; Call the method to generate the report
    o.GenerateReport, Status = Status, Error = Error
EndIf
End

An IDL Task routine definition is required to pass all its arguments via keywords. Other than that restriction, it is a standard IDL procedure. There is no magic required.

The new piece of functionality is the requirement of a JSON task definition file. Within this file we define the name of the task (which corresponds to the IDL procedure name) and the type definitions associated with each of the keywords.

The argument type definitions allow the IDL Task Engine itself to execute parameter type checking and validation before your procedure is even called, relieving you of the burden of writing code to ensure, for example, that a directory path that should be a string is not being populated by a floating point number, instead. For some pedants of certain schools of computer science thought, IDL's weak data type validation at compile time is a turn-off rather than a strength. Wrapping pure IDL in a task with stricter argument types enforced by the Task Engine is one way to assuage such opinions, perhaps as a stepping stone to more illuminated paths to consciousness.

Of course, it also means it makes your IDL Tasks less generic than they are within IDL itself.  A single IDL routine that may operate on any data type from byte values to double precision numbers may require two or more different IDL Task routines as wrappers if you want to expose more than one. Another option is to write your task with multiple keywords to accept different data types, then pass the input to a common processing algorithm.

The general JSON syntax of a Custom IDL Task is described here.

The JSON associated with the IDL task follows.

{
  "name": "IntelReportTask",
  "description": "Generates a report from a directory of images.",
  "base_class": "IDLTaskFromProcedure",
  "routine": "intelreporttask",
  "schema": "idltask_1.0",
  "parameters": [
    {
      "name": "IMAGE_DIRECTORY",
      "description": "URI to the directory containing image files",
      "type": "STRING",
      "direction": "input",
      "required": true
    },
    {
      "name": "AUTHOR",
      "description": "Label to apply as the author to the output report",
      "type": "STRING",
      "direction": "input",
      "required": false,
      "default": "UNKNOWN"
    },
    {
      "name": "OUTPUT_REPORT",
      "description": "URI to the output report file",
      "type": "STRING",
      "direction": "input",
      "required": true
    },
    {
      "name": "DEBUG",
      "description": "Flag to enable verbose debugging information during errors or processing",
      "type": "BOOLEAN",
      "direction": "input",
      "required": false,
      "default": false
    },
    {
      "name": "STATUS",
      "description": "Status of the report generation request at completion, or error.",
      "type": "BOOLEAN",
      "direction": "output",
      "required": false
    },
    {
      "name": "ERROR",
      "description": "Any error text generated during processing",
      "type": "STRINGARRAY",
      "dimensions": "[*]",
      "direction": "output",
      "required": false
    }
  ]
}

Here, we have identified optional and required keywords, their input/output directions, and data types, among other things.

In the IDL documentation, we show some examples for calling a procedure within the context of an IDLTask object within IDL itself.  In truth, this has limited utility outside of debugging. If you're even a semi-competent IDL wizard (which I assume you are if you have read this far), you will recognize that within the context of IDL, the IDLTask class and the task wrapper you have written is simply adding some overhead to a call you could make directly to your intended "worker" routine.

The real value of an IDL Task is shown when you insert your functionality into a heterogeneous workflow, outside of IDL itself.

In this environment, your framework will launch a command line-level script to execute your task.

On Windows, the default location for the script is in the installation directory, "C:\Program Files\Harris\idl86\bin\bin.x86_64\idltaskengine.bat".

On Linux, the default path is /usr/local/harris/idl/bin/idltaskengine.

The input to the idltaskengine script is JSON-format text that represents the name of the task along with the parameters.  The JSON may be passed to the script's standard input either through redirection from a file (<) or a pipe (|), for example,

<installpath>\idltaskengine.bat < <filepath>\my_intel_report_request.json

or

echo '{"taskName":"IntelReportTask","inputParameters":{"IMAGE_DIRECTORY":"<imagespath>"}, etc.}' | <installpath>/idltaskengine

It is the responsibility of your framework to construct the appropriate JSON object to be passed to the task engine script.

For our current example, the JSON might be constructed like this:

{
	"taskName": "IntelReportTask",
	"inputParameters": {
		"IMAGE_DIRECTORY": "/path-to-data/",
		"AUTHOR": "MELENDY",
		"OUTPUT_REPORT": "/path-to-report/myreport.pdf"
	}
}

Any parameters defined as having an output direction will be written to standard output in JSON format. In our example, the output might be returned in this general format if a handled error was encountered:

{
    "outputParameters": [{
        "STATUS": false
    }, {
        "ERROR": [
            "% SWAP_ENDIAN: Unable to swap object reference data type",
            "% Execution halted at: SWAP_ENDIAN        99 C:\\Program Files\\Harris\\IDL86\\lib\\swap_endian.pro",
            "%                      $MAIN$"
        ]
    }]
}

In the event of a truly wretched error, one that was unable to populate the JSON, the stderr return from the call to the IDL Task Engine script should be queried as well. See the "Exit Status" section of the online help topic, at the bottom of the page.

Your surrounding framework should be designed to validate the status return from the IDL Task Engine script on standard error first, then check for and parse any JSON returned on standard output.

More Examples

Additional IDL Task examples can be found here.

Geospatial Framework (GSF)

The Harris Geospatial Framework product (GSF) is just one example implementation of  a distributed processing architecture into which IDL Tasks might be "snapped".  Despite its marketing name, it is not limited to processing geospatial data only.



Please login or register to post comments.