Use app×
QUIZARD
QUIZARD
JEE MAIN 2026 Crash Course
NEET 2026 Crash Course
CLASS 12 FOUNDATION COURSE
CLASS 10 FOUNDATION COURSE
CLASS 9 FOUNDATION COURSE
CLASS 8 FOUNDATION COURSE
+1 vote
84 views
in Information Technology by (178k points)
How do you submit a job to an EMR cluster?

Please log in or register to answer this question.

1 Answer

+1 vote
by (178k points)

Submitting a job to an EMR cluster can be done in several ways, depending on the type of job and the tools you're using. Here are common methods to submit jobs to an EMR cluster:

1. Submitting a Job via the AWS Management Console

  1. Create or Access Your EMR Cluster:

    • Navigate to the EMR section of the AWS Management Console.
    • Create a new cluster or select an existing cluster.
  2. Add Steps to Your Cluster:

    • Go to the Steps tab within your cluster details.
    • Click Add step.
    • Choose the type of step you want to add (e.g., Spark, Hive, Hadoop, etc.).
    • Provide the required parameters for the step, such as the script location, input and output paths, and any arguments.
    • Click Add to submit the step.

2. Submitting a Job via the AWS CLI

You can use the aws emr add-steps command to submit a job. Here’s an example for submitting a Spark job:

aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \
--steps Type=Spark,Name="Spark Program",ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--class,org.apache.spark.examples.SparkPi,s3://my-bucket/spark-job.jar,10] 

Explanation of the parameters:

  • --cluster-id: The ID of your EMR cluster.
  • --steps: The steps to be added to the cluster.
  • Type: The type of step (e.g., Spark, Hive, etc.).
  • Name: A name for the step.
  • ActionOnFailure: What to do if the step fails (CONTINUE, CANCEL_AND_WAIT, or TERMINATE_CLUSTER).
  • Args: Arguments for the step, such as the deploy mode, main class, and any other necessary parameters.

3. Submitting a Job via the EMR API

You can use the AWS SDKs to interact with the EMR API and submit jobs programmatically. Here’s an example using the Python SDK (boto3):

import boto3

emr_client = boto3.client('emr')

response = emr_client.add_job_flow_steps(
    JobFlowId='j-2AXXXXXXGAPLF',
    Steps=[
        {
            'Name': 'Spark Program',
            'ActionOnFailure': 'CONTINUE',
            'HadoopJarStep': {
                'Jar': 'command-runner.jar',
                'Args': [
                    'spark-submit',
                    '--deploy-mode', 'cluster',
                    '--class', 'org.apache.spark.examples.SparkPi',
                    's3://my-bucket/spark-job.jar',
                    '10'
                ]
            }
        }
    ]
)

print(response) 

4. Submitting a Job via SSH

If you need more control or need to execute complex workflows, you can SSH into the master node of your EMR cluster and manually run commands. Here’s how to do it:

  1. SSH into the Master Node:

    ssh -i /path/to/your-key.pem hadoop@<master-node-dns> 
  2. Run Your Job: For example, to submit a Spark job:

    spark-submit --deploy-mode cluster --class org.apache.spark.examples.SparkPi s3://my-bucket/spark-job.jar 10 

Summary

Submitting jobs to an EMR cluster can be done via the AWS Management Console, AWS CLI, AWS SDKs, or by SSH-ing into the cluster's master node. Each method provides different levels of control and convenience, allowing you to choose the most suitable approach based on your requirements.

Related questions

+1 vote
1 answer
+1 vote
1 answer

Welcome to Sarthaks eConnect: A unique platform where students can interact with teachers/experts/students to get solutions to their queries. Students (upto class 10+2) preparing for All Government Exams, CBSE Board Exam, ICSE Board Exam, State Board Exam, JEE (Mains+Advance) and NEET can ask questions from any subject and get quick answers by subject teachers/ experts/mentors/students.

Categories

...