Welcome to the Harris Geospatial product documentation center. Here you will find reference guides, help documents, and product libraries.


Harris Geospatial / Docs Center / Geospatial Services Framework / GSF Tutorial: Amazon S3 Workspace

Geospatial Services Framework

GSF Tutorial: Amazon S3 Workspace

GSF - Tutorial - Amazon S3 Workspace

About


The Amazon S3 Workspace Manager allows output files from jobs to be stored as S3 Objects.

Those output files are available to all the other nodes as input by referencing the job ID and relative file path.

Amazon S3 does not allow for random file access; therefore, this workspace manager copies files from the local file system to S3 storage as needed. These local files are often temporary and automatically cleaned up with the default configuration.

While jobs are executing, the engine writes output files into a folder specified by this workspace manager. When a job has finished and all output files have been written, this workspace manager copies all output files to the configured S3 Bucket. Each file in the job folder gets an S3 key with a relative path from the job folder to the file. This way all files have a unique key (jobId/path/file) and maintain their folder structure.

If the data for a previous job needs to be used as input for the next job, the server asks the workspace manager for a file based on its job ID and relative path. This workspace manager then downloads the entire job folder and all of its files to a local location and returns an absolute path to the requested file. This ensures any ancillary files are also available. Many file formats such as shapefiles store their data across several files, and it can be difficult to support each file format individually. By downloading the whole folder, the system avoids most of those issues.

Configuring the Server


For the simple configuration, just specify this module by name in the config.json. Restart the server any time this file is changed so it reflects the new configuration.

To configure the GSF Amazon S3 Workspace Manager from a command line, start a command prompt in the GSFxx directory and execute the following command:

node updateConfig.js config.json --set workspaceManager={}
node updateConfig.js config.json --set workspaceManager.type=gsf-amazon-s3-workspace-manager

This clears out any existing workspace manager settings and sets the workspace manager type to the Amazon S3 Workspace Manager. The updateConfig.js script will automatically back up the original config.json file for you. If you choose to manually edit the config.json file, it is recommended that you backup the original file.

Example config.json excerpt with changes to workspace manager:

"workspaceManager": {
  "type: "gsf-amazon-s3-workspace-manager"
},

The default configuration for the Amazon S3 workspace assumes the EC2 node where it is running has access, and as such, does not need a secret key to connect. In this example, data is stored in a bucket called "HarrisESE." For clarity and organisation, all keys are prepended with the value "ESEWorkspace", and local data is deleted after being copied to the cloud.

These values are configurable as well through the top-level config file.

Example config.json excerpt after modifications:

"workspaceManager": {
  "type"           : "gsf-amazon-s3-workspace-manager",
  "tempRoot"       : "S3workspace", // the local folder to write job data before it is copied.
  "S3Root"         : "ESEWorkspace", // the top level folder in the bucket
  "S3Bucket"       : "HarrisESE", // the name of the bucket
  "clearOnStart"   : true, // this will delete the tempRoot folder when the server starts
  "clearTempOnDone": true // delete all job data after it has been copied to S3
}

To set these additional values (after having set the workspace manager type above) using a command line, enter the following commands from within the GSFxx directory:

node updateConfig.js config.json --set workspaceManager.tempRoot=S3workspace
node updateConfig.js config.json --set workspaceManager.S3Root=ESEWorkspace
node updateConfig.js config.json --set workspaceManager.S3Bucket=HarrisESE
node updateConfig.js config.json --set workspaceManager.clearOnStart=true
node updateConfig.js config.json --set workspaceManager.clearTempOnDone=true

Using S3 with explicit credentials


If a user needs to operate the S3 workspace from a machine without automatic permissions (like a local developers machine), it is possible to tell the workspace to use a specific accessKey.

To configure the config.json file with your credentials from a command line, start a command prompt in the GSFxx directory and enter the following commands (replace YourAccessKeyID and YourSecretKey with your personal keys):

node updateConfig.js config.json --set workspaceManager.accessKeyId=YourAccessKeyID
node updateConfig.js config.json --set workspaceManager.secretAccessKey=YourSecretKey

The updateConfig.js script will automatically back up the original config.json file for you.

You may also manually configure the access key by editing the config.json as shown below. It is recommended that you back up your config file before making any changes manually.

"workspaceManager": {
  "type"           : "gsf-amazon-s3-workspace-manager",
  "accessKeyId"    : "Your Access Key ID",
  "secretAccessKey": "Your Secret Key"
}

Restart the server any time this file is changed so it reflects the new configuration.

These values are account-specific and can be downloaded when the account is generated. Lost secret key information can be regenerated in the Amazon console.



© 2017 Exelis Visual Information Solutions, Inc. |  Legal
My Account    |    Buy    |    Contact Us