AWS Data Pipeline provides several mechanisms for handling errors to ensure that your data workflows are reliable and resilient. Here are the key aspects of error handling in AWS Data Pipeline:
1. Retry Logic
AWS Data Pipeline automatically retries failed activities based on the configuration you specify. This is useful for transient errors that may resolve on subsequent attempts.
- Retry Attempts: You can specify the number of retry attempts for an activity. This is done using the attemptTimeout and attempts fields.
- Retry Interval: The interval between retry attempts can also be specified.
Example:
{
"id": "MyActivity",
"type": "ShellCommandActivity",
"schedule": { "ref": "Schedule" },
"attemptTimeout": "1 hour",
"attempts": 3,
"command": "bash process-data.sh",
"runsOn": { "ref": "MyEmrCluster" }
}
In this example, MyActivity will retry up to 3 times, with each attempt allowed to run for up to 1 hour.
2. Failure and Rerun Mode
The failureAndRerunMode parameter controls how the pipeline handles failures and reruns. There are several modes available:
- NONE: No automatic retries or reruns. The activity fails immediately.
- CASCADE: When an activity fails, dependent activities are also marked as failed, and the pipeline stops.
- CASCADE_AND_RERUN: When an activity fails, dependent activities are marked as failed, but the pipeline attempts to rerun the failed activities.
Example:
{
"id": "Default",
"name": "Default",
"scheduleType": "cron",
"failureAndRerunMode": "CASCADE_AND_RERUN"
}
3. Error Notifications
AWS Data Pipeline can send notifications when an activity fails or retries. You can configure SNS (Simple Notification Service) topics to receive these notifications.
Example:
{
"id": "ErrorNotification",
"type": "SnsAlarm",
"schedule": { "ref": "Schedule" },
"role": "DataPipelineDefaultRole",
"subject": "Activity Failed",
"message": "An activity has failed in the pipeline.",
"topicArn": "arn:aws:sns:us-west-2:123456789012:MySNSTopic"
}
4. Logging and Monitoring
- CloudWatch Logs: Activities can be configured to log their output to Amazon CloudWatch Logs, providing visibility into what went wrong.
- Pipeline Logs: Detailed logs of pipeline execution, including activity status and error messages, are available in the AWS Data Pipeline console.
5. Custom Error Handling
For more advanced error handling, you can use custom logic within your activities. For example, a shell script or a custom application can include error handling code to manage retries, send notifications, or take other corrective actions.
Example JSON Pipeline with Error Handling
Here’s a more complete example that includes various error handling mechanisms:
{
"objects": [
{
"id": "Default",
"name": "Default",
"scheduleType": "cron",
"failureAndRerunMode": "CASCADE_AND_RERUN"
},
{
"id": "Schedule",
"type": "Schedule",
"startAt": "FIRST_ACTIVATION_DATE_TIME",
"period": "1 day"
},
{
"id": "MyS3Input",
"type": "S3DataNode",
"schedule": { "ref": "Schedule" },
"directoryPath": "s3://my-bucket/input/"
},
{
"id": "MyS3Output",
"type": "S3DataNode",
"schedule": { "ref": "Schedule" },
"directoryPath": "s3://my-bucket/output/"
},
{
"id": "MyActivity",
"type": "ShellCommandActivity",
"schedule": { "ref": "Schedule" },
"input": { "ref": "MyS3Input" },
"output": { "ref": "MyS3Output" },
"runsOn": { "ref": "MyEmrCluster" },
"command": "bash process-data.sh",
"attemptTimeout": "1 hour",
"attempts": 3
},
{
"id": "MyEmrCluster",
"type": "EmrCluster",
"schedule": { "ref": "Schedule" },
"instanceCount": 3,
"masterInstanceType": "m4.large",
"coreInstanceType": "m4.large",
"coreInstanceCount": 2
},
{
"id": "ErrorNotification",
"type": "SnsAlarm",
"schedule": { "ref": "Schedule" },
"role": "DataPipelineDefaultRole",
"subject": "Activity Failed",
"message": "An activity has failed in the pipeline.",
"topicArn": "arn:aws:sns:us-west-2:123456789012:MySNSTopic"
}
]
}