Connect with us

AI

Building a scalable outbound call engine using Amazon Connect and Amazon Lex

­ This is a guest post by AWS Machine Learning Hero Cyrus Wong. Staying connected with family, friends, and colleagues is easy for most people who live with or close to others. For educators who need to communicate lessons and schedules with their students, or businesses who communicate with new and existing customers, staying connected […]

Published

on

This is a guest post by AWS Machine Learning Hero Cyrus Wong.

Staying connected with family, friends, and colleagues is easy for most people who live with or close to others. For educators who need to communicate lessons and schedules with their students, or businesses who communicate with new and existing customers, staying connected can be hard, especially in times of crisis and isolation.

Specifically, I wanted to make remote communication between educators and students easier. Communicating time-sensitive information and confirming that students received messages can be hard; scaling communication from tens to thousands of students can make the problem more complex, impacting educator and student productivity, time, and overall experience.

To meet this challenge, I developed Callouts, a simple, consistent, and scalable solution for educators to communicate with students a using Amazon Connect and Amazon Lex. In crisis times, such as a quarantine, this solution helps educators use an automated bot that calls students to communicate important messages, such as schedule changes, general announcements, and attendance confirmation.

Even if the resulting calls are similar, building scalable contact flows and chatbots can take time. By generalizing a survey-like call job to contact multiple recipients in parallel, Callouts makes it easy for developers to create sophisticated conversational experiences. Non-technical users who may find this intimidating can simply upload an Excel file into an Amazon Simple Storage Service (Amazon S3) bucket to trigger an automatic process that ultimately results in the AI agent calling multiple recipients at the same time.

Architecture and design

Callouts uses AWS Serverless Application Model (AWS SAM), an open-source framework for building serverless applications. It offers a syntax designed specifically for expressing serverless resources.

The following diagram illustrates the architecture.

The goal of the architecture is to allow non-technical users to define a “call job” and request execution without having to write code. The user creates an Excel file that contains their call tasks (for example, “You now have until Friday to submit your homework.”) and uploads it to Amazon S3. This triggers CreateExcelCallJobFunction, which converts the Excel file into a JSON message and sends it to an Amazon Simple Queue Service (Amazon SQS) FIFO queue (CallSqsQueue). An AWS Lambda function connected to the SQS queue processes the incoming messages, creates individual call task data, uploads the data to an S3 bucket, and starts the CalloutStateMachine AWS Step Functions task. The individual job data is saved and loaded from Amazon S3 to prevent sending an oversized payload to the start execution API. The ReservedConcurrentExecutions value of StartCallOutFlowFunction is set to 1 to make sure only one job goes to the state machine at a time.

This design allows other systems to create a call job by sending the defined data message to the SQS queue directly.

CalloutStateMachine

The following diagram shows the CalloutStateMachine workflow.

  1. One callout job includes a set of callout tasks, which calls one receiver.
  2. The callout task proceeds with dynamic parallelism. For more information, see New – Step Functions Support for Dynamic Parallelism.
  3. The step “Get Callout task” is a Lambda function to get the call task JSON from ExcelCallTaskBucket.
  4. The step “Callout with Amazon Connect” sends the message to AsynCalloutQueue.
  5. This step waits for the callback with task token or “Call Timeout.” For more information, see Call Amazon SQS with Step Functions.
  6. “Get Call Result” combines call results and generates an Excel call report.
  7. A completion message goes to the SNS topic.

This pattern lets the Amazon Connect contact flow use the SendTaskSuccess API to provide a call result for each outbound call with the task token. If “Callout with AWS Connect” can’t call back within 5 minutes by default, then the job goes through the “Call Timeout” state. For longer communications that may need more time to complete, you can change the AWS CloudFormation TimeoutSeconds parameter.

“Save call result” saves the call result to CallResultDynamoDBTable. “Get Call Result” retrieves all call results from CallResultDynamoDBTable, generates the call report, and uploads the report to the CallReportBucket.

Finally, the job publishes a message to the CallJobCompletion SNS topic. The message contains the task ID, bucket name, Excel report key, JSON report key, and a pre-assigned URL of the Excel report.

You can create an email subscription to the SNS topic to get a notification message upon completion of the call job. See the following screenshot for an example.

Callouts with Amazon Connect

The following diagram shows the architecture of Callouts with Amazon Connect.

  1. The “Callout with Amazon Connect” step of the CalloutStateMachine workflow sends an individual call task JSON to the SQS FIFO queue AsynCalloutQueue.
  2. A SQS new message event triggers CalloutFunction.
  3. CalloutFunction sends the call task to a “Calling out” contact flow.
  4. If a phone number isn’t accessible, the function calls back to CalloutStateMachine through the SendTaskSuccess API with the status NotCallable. (If one of the phone calls fails and the function calls the SendTaskFailure API, the whole workflow fails. The workflow has to continue even if some calls fail.)
  5. The “Calling out” contact flow interacts with three Lambda functions. SendTaskSuccessFunction calls to CalloutStateMachine with the status CallCompleted when the “Calling out” contact flow is successfully complete.

“Calling out” contact flow

The following diagram illustrates the “Calling out” contact flow.

Although the contact flow may seem complicated, the logic is straightforward.

“Calling out” architecture

The following diagram illustrates the “Calling out” architecture.

The following are a few highlights:

  • Enabling logging is very useful for call debugging, and you can use CloudWatch Logs Insights to trace a call from the log stream. For more information, see Analyzing Log Data with CloudWatch Logs Insights. For example, see the following code:
    fields @timestamp, @message
    | filter @message like "Specific Contact Flow Id"
    | sort @timestamp asc

  • You can invoke Lambda functions to do anything, such as saving data into Amazon DynamoDB and checking information from databases. For more information, see Invoke AWS Lambda Functions.
  • The “Get Customer Input” block redirects the call to Amazon Lex and returns the intent name and intent slot value for number, date, and time question types. For more information, see Create a Contact Flow and Add Your Amazon Lex Bot.
  • The chat flow omits the possibility of error, and all errors in Lambda or Amazon Lex end the call. Additionally, all error handlers of contact flow blocks connect to the Disconnect/Hang Up block.
  • SendTaskSuccessFunction calls the SendTaskSuccess API with status CallCompleted to finish the “Callout with AWS Connect” step.

Amazon Lex CalloutBot

This chatbot contains a set of intents that captures simple answers such as yes or no, letters, numbers, dates, and times; they don’t need a slot. The ExcelLexBot engine creates slots for each answer. For more sample flows and sample utterances, see Build an Amazon Lex Chatbot with Microsoft Excel and Building Better Bots Using Amazon Lex (Part 1).

This project contains four chatbots built on Amazon Lex to handle different scenarios: CalloutBot, CalloutBotDate, CalloutBotNumber, and CalloutBotTime.

The following conversation shows the question model:

Contact Flow: Play a question based on question_template and receiver data.
Chatbot Agent: Wait for answer in question_type.
User: Answer in question_type.

CalloutBot contains OkIntent, YesIntent, NoIntent, AIntent, BIntent, CIntent, DIntent, and EIntent. All the intents have a set of sample utterances, and Amazon Connect uses the intent name to capture the user’s answer. The following screenshots show the details in Excel for OkIntent and AIntent.

BIntent, CIntent, DIntent, and EIntent are all similar to AIntent. The bot uses those eight intents to handle OK, Yes/No, and multiple choice question types.

CalloutBotDate contains a DateIntent to solicit date information from the caller, receiver, or user; for example, for an appointment. See the following screenshot.

CalloutBotNumber contains a NumberIntent to solicit number information from the caller, receiver, or user; for example, for the number of attempts. See the following screenshot.

CalloutBotTime contains a TimeIntent to solicit time information from the caller, receiver, or user; for example, appointment information. See the following screenshot.

All chatbots contain AMAZONFallbackIntent and AMAZONRepeatIntent for the built-in intents FallbackIntent and RepeatIntent. The system uses built-in intents to capture the repeat and the message the chatbot can’t understand with a fallback intent. The contact flow repeats the question for the repeat or fallback intent captured. For more information, see Managing conversation flow with a fallback intent on Amazon Lex.

ExcelLexBot creates a DynamoDB table per intent to save each user answer and all intent history. For example, the following screenshot shows that after the receiver answered OK, you can find the record in the CalloutBotOkIntent table.

ContactId can help you to trace each call.

Call job Excel format

Non-technical users can also use Excel to create a call job. There are three Excel sheets: Configures, Questions, and Receivers. They are straightforward and don’t require any programming knowledge, and users need to fill in all three sheets for a call job. Developers can integrate Callouts into other systems by sending a JSON object into CallSqsQueue programmatically.

Receivers sheet

The list of receivers contains the following information:

  • id – A unique identifier for each user.
  • phone_number – The user’s phone number. If you specify the phone_prefix in the Configures sheet, you don’t need to add the country code here.
  • Additional columns – Optional columns for your message. For example, the following table includes the additional column username.
id phone_number username
1 12345678 Cyrus
2 89654201 Cyrus Wong

Configures sheet

The Configures sheet contains the following information for the common settings of this call job:

  • greeting – Greeting message for the call
  • ending – Closing message for the call
  • phone_prefix – International subscriber dialing (ISD) code for each phone call (the + is optional)

The following table shows example values in a Configures sheet.

Key Value
greeting Hi {{ username }}, This is a simple survey.
ending Good Bye {{ username }} and have a nice day!
phone_prefix +852

Questions sheet

Each row in the Questions sheet represents one question. There are two columns:

  • question_template – A Jinja 2 template generates output for each row receiver
  • question_type – This contains the following question types:
    • OK – Captures the answer “OK,” which is for the reminder use case
    • Yes/No – Captures the answer “Yes” or “No”
    • Multiple Choice – Captures the answer “A,” “B,” “C,” “D,” or “E”
    • Date – Captures DATE
    • Time – Captures TIME
    • Number – Captures NUMBER

The following table shows example values of question_template and question_type.

question_template question_type
Are you using Amazon Connect? Yes/No
How do you first hear about Amazon Connect? A. Newsletter, B. Social Media, C. AWS Event, D. AWS Website, or E. From Friend. Multiple Choice
How many applications do you use with Amazon Connect? Number
When should we call you back? Date
Preferred call back time? Time
In this demo, I want you to say OK. OK

If you just want to make a call, remove all rows and keep the header.

The question_template, greeting, and ending columns are in the Jinja template and generate an output with each receiver row. All messages use Amazon Connect SSML and are embedded with <speak>message</speak>, so you must not add the speak tag. For more information, see SSML in Amazon Connect Contact Flows.

Call report in Excel

The following screenshot is an example of an Excel call report.

The report contains the following columns:

  • task_id – The Excel file name. If you upload the same file again, you overwrite the result.
  • receiver_id – The receiver ID from the Receivers sheet.
  • call_at – The call start time.
  • status – The status of the call. There are two values:
    • CallCompleted – The receiver picked up the call and answered all the questions.
    • DropCall – The receiver either didn’t pick up the call or didn’t complete all the questions.
  • error – The entry null means no error or exception message.
  • phone_number – The receiver’s phone number.
  • username – The additional field from the call job from the Receivers sheet. All additional columns are copied to the result report.
  • Question_x – The number in the column name changes with the number of the question and shows the receiver’s answer.

Deploying Callouts

This section provides a walkthrough of deploying Callouts.

Creating an Amazon Connect instance and setting up contact flow

To create an Amazon Connect instance and set up the contact flow, complete the following steps:

  1. Create a virtual contact center instance in us-east-1. For instructions, see Create an Amazon Connect Instance.
    1. In Step 3, select I want to make outbound calls with Amazon Connect.
  2. Download the following contact flow from the GitHub repo.
  3. Import the “Calling out” contact flow. For instructions, see Export and Import a Contact Flow.
  4. Choose Show additional flow information.
  5. Locate the contact flow ARN (see the following screenshot).
  6. Record the InstanceId and ContactFlowId.

The Amazon Connect ARN format is arn:${Partition}:connect::${Account}:instance/${InstanceId}/contact-flow/${ContactFlowId}. For more information, see Resource Types Defined by Amazon Connect.

  1. Set a phone number for the “Calling out” contact flow. For instructions, see Claim a Phone Number and Associate a Phone Number with a Contact Flow.
  2. Record the phone number.

Deploying the Amazon Lex Chatbot and Callouts serverless application

To deploy the chatbot, complete the following steps:

  1. Sign in to your AWS account.
  2. Choose the US East (N. Virginia) Region.
  3. Open AWS Serverless Application Repository for ExcelLexBot.
  4. Select I acknowledge that this app creates custom IAM roles and resource policies.
  5. Choose Deploy.
  6. Wait for the completion message.
  7. Choose View CloudFormation Stack.
  8. Choose Outputs.
  9. Download the zipped chatbot Excel files from the GitHub repo and unzip them.
  10. Upload the four Excel files into the S3 bucket LexExcelBucket; for example, serverlessrepo-excellexbot-bucket-1bxqjwlbfqjy9.
  11. On the AWS CloudFormation console, wait for the status of all four stacks to show as CREATE_COMPLETE.
  12. Open the AWS Serverless Application Repository for Callouts.
  13. Select I acknowledge that this app creates custom IAM roles and resource policies.
  14. Choose Deploy with the parameters Create Connect Instance and set up contract flow (the application name is S3 bucket name prefix, and I suggest you use the default value).
  15. Wait for the completion message.

Giving permission to the Amazon Connect instance

To give permission to your Amazon Connect instance, add the Amazon Lex bot to the Amazon Connect instance and add CalloutBot_ExcelLexBot, CalloutBotDate_ExcelLexBot, CalloutBotNumber_ExcelLexBot, and CalloutBotTime_ExcelLexBot.

You don’t need to set up permission for the Lambda functions IteratorFunction, ResponseHanlderFunction, and SendTaskSuccessFunction for the Amazon Connect instance because AWS CloudFormation granted the invoke function permission.

When the deployment is complete, you can upload your call job to the S3 bucket.

Un-deploying Callouts

To un-deploy Callouts, complete the following steps:

  1. Go to LexExcelBucket and delete the four chatbot Excel files.

This action triggers the deletion of the four chatbot stacks, which takes a few minutes.

  1. Delete all files in excelcalljobbucket and callreportbucket.
  2. After the chatbot stacks are deleted, delete serverlessrepo-ExcelLexBot and serverlessrepo-awscallouts.

Creating the outbound call job

To create the outbound call job, complete the following steps:

  1. Download the Excel example from the GitHub repo.
  2. Change the content in the file—at least the phone number and phone_prefix.
  3. Upload the file to the S3 bucket that contains excelcalljobbucket in the name.
  4. Wait for up to 5 minutes and download the call report from the S3 bucket that contains callreportbucket in the name.

Demos

For examples of using Callouts, see the following videos on YouTube:

Conclusion

This post demonstrates how to build Callouts, a solution for educators to contact students in a simple, consistent, and scalable way using Amazon Connect and Amazon Lex. Based on the user experience at the Hong Kong Institute of Vocational Education, we believe that this solution not only benefits educators and students, but can also aid caregivers, businesses, and individuals.

Project collaborators include Mike Ng, Technical Program Intern at AWS, Brian Cheung, Sam Lam, and Pearly Law from the IT114115 Higher Diploma in Cloud and Data Centre Administration. Special thanks to the AWS team, including Dickson Yue, Jerry Yuen, Niranjan Hira, Randall Hunt, and Cameron Peron, for educating and supporting our team.


About the Author

Cyrus Wong is a Data Scientist at the Hong Kong Institute of Vocational Education (Lee Wai Lee) Cloud Innovation Centre, has achieved all 13 AWS Certifications, and enjoys sharing his AWS knowledge with others through open-source projects, blog posts, and events.

Source: https://aws.amazon.com/blogs/machine-learning/building-a-scalable-outbound-call-engine-using-amazon-connect-and-amazon-lex/

AI

Optimizing costs for machine learning with Amazon SageMaker

Applications based on machine learning (ML) can provide tremendous business value. Using ML, we can solve some of the most complex engineering problems that previously were infeasible. One of the advantages of running ML on the AWS Cloud is that you can continually optimize your workloads and reduce your costs. In this post, we discuss […]

Published

on

Applications based on machine learning (ML) can provide tremendous business value. Using ML, we can solve some of the most complex engineering problems that previously were infeasible. One of the advantages of running ML on the AWS Cloud is that you can continually optimize your workloads and reduce your costs. In this post, we discuss how to apply such optimization to ML workloads. We consider available options such as elasticity, different pricing models in cloud, automation, advantage of scale, and more.

Developing, training, maintaining, and performance tuning ML models is an iterative process that requires continuous improvement. Determining the optimum state in the model while going through the permutations and combinations of model parameters and data dependencies to adjust is just one leg of the journey. There is more to optimizing the cost of ML than just algorithm performance and model tuning. There is also some effort required to integrate developed models into applications and realize their benefits. Throughout this process, you can keep the cost down in numerous ways. Amazon SageMaker has made most of this journey smooth so developers and data scientists can spend most of their time focusing on what matters the most—delivering business value.

Amazon SageMaker notebook instances

An Amazon SageMaker notebook instance is an ML compute instance running the Jupyter Notebook app. This notebook instance comes with sample notebooks, several optimized algorithms, and complete code walkthroughs. Amazon SageMaker manages the creation of this instance and related resources. Consider using Amazon SageMaker Studio notebooks for collaborative workloads and when you don’t need to set up compute instances and file storage beforehand.

You can follow these best practices to help reduce the cost of notebook instances.

GPU or CPU?

CPUs are best at handling single, more complex calculations sequentially, whereas GPUs are better at handling multiple but simple calculations in parallel. For many use cases, a standard current generation instance type from an instance family such as ml.m* provides enough computing power, memory, and network performance for many Jupyter notebooks to perform well. GPUs provide a great price/performance ratio if you take advantage of them effectively. However, GPUs also cost more, and you should choose GPU-based notebooks only when you really need them.

Ask yourself: Is my neural network relatively small scale? Is my network performing tons of calculations involving hundreds of thousands of parameters? Can my model take advantage of hardware parallelism such as P3 and P3dn instance families?

Depending on the model, the GPU communication overhead might even degrade performance. So, take a step back and start with what you think is the minimum requirement in terms of ml instance specification and work your way up to identifying the best instance type and family for your model.

If you’re using your notebook instance to train multiple jobs, decide when you need a GPU-enabled instance and when you don’t. If you need accelerated computing in your notebook environment, you can stop your m* family notebook instance, switch to a GPU-enabled P* family instance, and start it again. Don’t forget to switch it back when you no longer need that extra boost in your development environment.

If you’re using massive datasets for training and don’t want to wait for days or weeks to finish your training job, you can speed up the process by distributing training on multiple machines or processes in a cluster.

It’s recommended to use a small subset of your data for development in your notebook instance. You can use the full dataset for a training job that is distributed across optimized instances such as P2 or P3 GPU instances or an instance with powerful CPU, such as c5.

Maximize instance utilization

You can optimize your Amazon SageMaker notebook utilization many different ways. One simple way is to stop your notebook instance when you’re not using it and start when you need it. Consider auto-detecting idle notebook instances and managing their lifecycle using a lifecycle configuration script. For detailed implementation, see Right-sizing resources and avoiding unnecessary costs in Amazon SageMaker. Remember that the instance is only useful when you’re using the Jupyter notebook. If you’re not working on a notebook overnight or over the weekend, it’s a good idea to schedule a stop and start. Another way to save instance cost is by scheduling an AWS Lambda function. For example, you can stop all instances at 7:00 PM and start them at 7:00 AM.

You can also use Amazon CloudWatch Events to start and stop the instance based on an event. If you’re feeling geeky, connect it to your Amazon Rekognition based system to start a data scientist’s notebook instance when they step into the office or have Amazon Alexa do it as you grab a coffee.

Training jobs

The following are some best practices for saving costs on training jobs.

Use pre-trained models or even APIs

Pre-trained models eliminate the time spent gathering data and training models with that data. Consider using higher-level APIs such as provided by Amazon Rekognition or Amazon Comprehend to help you avoid spending on tasks that are already done for you. As an example, Amazon Comprehend simplifies topic modeling on a large corpus of documents. You can also use the Neural topic modeling (NTM) algorithm in Amazon SageMaker to get similar results with more effort. Although you have more control over hyperparameters when training your own model, your use case may not need it. A lot of engineering work and experience goes into creating ready-to-consume and highly optimized models, therefore an upfront ROI analysis is highly recommended if you’re embarking on a journey to develop similar models.

Use Pipe mode (where applicable) to reduce training time

Certain algorithms in Amazon SageMaker like Blazing text work on a large corpus of data. When these jobs are launched, significant time goes into downloading the data from Amazon Simple Storage Service (Amazon S3) into the local Amazon Elastic Block Storage (Amazon EBS) store. Your training jobs don’t start until this download finishes. These algorithms can take advantage of Pipe mode, in which training data is streamed from Amazon S3 into Amazon EBS and your training jobs start immediately. For example, training Blazing text on common crawl (3 TB) can take a few days, out of which a significant number of hours are just lost in download. This process can take advantage of Pipe mode to reduce significant training time.

Managed spot training in Amazon SageMaker

Managed spot training can optimize the cost of training models up to 90% over On-Demand Instances. Amazon SageMaker manages the Spot interruptions on your behalf. If your training job can be interrupted, use managed spot training. You can specify which training jobs use Spot Instances and a stopping condition that specifies how long Amazon SageMaker waits for a job to run using EC2 Spot Instances.

You may also consider using EC2 Spot Instances if you’re willing to do some extra work and if your algorithm is resilient enough to interruptions. For more information, see Managed Spot Training: Save Up to 90% On Your Amazon SageMaker Training Jobs.

Test your code locally

Resolve issues with code and data so you don’t need to pay to run training clusters for failed training jobs. This also saves you time spent initializing the training cluster. Before you submit a training job, try to run the fit function in local mode to fetch some early feedback:

mxnet_estimator = MXNet('train.py', train_instance_type='local', train_instance_count=1)

Monitor the performance of your training jobs to identify waste

Amazon SageMaker is integrated with CloudWatch out of the box and publishes instance metrics of the training cluster in CloudWatch. You can use these metrics to see if you should make adjustments to your cluster, such as CPUs, memory, number of instances, and more. To view the CloudWatch metric for your training jobs, navigate to the Jobs page on the Amazon SageMaker console and choose View Instance metrics in the Monitor section.

Also, use Amazon SageMaker Debugger, which provides full visibility into model training by monitoring, recording, analyzing, and visualizing training process tensors. Debugger can dramatically reduce the time, resources, and cost needed to train models.

Find the right balance: Performance vs. accuracy

Compare the throughput of 16-bit floating point and 32-bit floating point calculations and determine what is right for your model. 32-bit (single precision or FP32) and even 64-bit (double precision or FP64) floating point variables are popular for many applications that require high precision. These are workloads like engineering simulations that simulate real-world behavior and need the mathematical model to be as exact as possible. In many cases, however, reducing memory usage and increasing speed gained by moving to half or mixed precision (16-bit or FP16) is worth the minor tradeoffs in accuracy. For more information, see Accelerating GPU computation through mixed-precision methods.

A similar trade-off also applies when deciding on the number of layers in your neural network for your classification algorithms, such as image classification.

Tuning (hyperparameter optimization) jobs

Use hyperparameter optimization (HPO) when needed and choose the hyperparameters and their ranges to tune on wisely.

Some API calls can result in a bill of hundreds or even thousands of dollars, and tuning jobs are one of those. A good tuning job can save you many working days of expensive data scientists’ time and provide a significant lift in model performance, which is highly beneficial. HPO in Amazon SageMaker finds good hyperparameters quicker if the search space is narrow (for example, a learning rate of 0.01–0.05 rather than 0.001–0.9). If you have some relevant prior knowledge about the hyperparameter range, start with that. For wide hyperparameter ranges, you may want to consider logarithmic transformations.

Amazon SageMaker also reduces the amount of time spent tuning models using built-in HPO. This technology automatically adjusts hundreds of different combinations of parameters to quickly arrive at the best solution for your ML problem. With high-performance algorithms, distributed computing, managed infrastructure, and HPO, Amazon SageMaker drastically decreases the training time and overall cost of building production grade systems. You can see examples of HPO in some of the Amazon SageMaker built-in algorithms.

For longer training jobs and as the training time for each training job gets longer, you may also want to consider early stopping of training jobs.

Hosting endpoints

The following section discusses how to save cost when hosting endpoints using Amazon SageMaker hosting services.

Delete endpoints that aren’t in use

Amazon SageMaker is great for testing new models because you can easily deploy them into an A/B testing environment. When you’re done with your tests and not using the endpoint extensively anymore, you should delete it. You can always recreate it when you need it again because the model is stored in Amazon S3.

Use Automatic Scaling

Auto Scaling your Amazon SageMaker endpoint doesn’t just provide high availability, better throughput, and better performance, it also optimizes the cost of your endpoint. Make sure that you configure Auto Scaling for your endpoint, monitor your model endpoint, and adjust the scaling policy based on the CloudWatch metrics. For more information, see Load test and optimize and Amazon SageMaker endpoint using automatic scaling.

Amazon Elastic Inference for deep learning

Selecting a GPU instance type that is big enough to satisfy the requirements of the most demanding resource for inference may not be a smart move. Even at peak load, a deep learning application may not fully utilize the capacity offered by a GPU. Consider using Amazon Elastic Inference, which allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Amazon SageMaker instances to reduce the cost of running deep learning inference by up to 75%.

Host multiple models with multi-model endpoints

You can create an endpoint that can host multiple models. Multi-model endpoints reduce hosting costs by improving endpoint utilization and provide a scalable and cost-effective solution to deploying a large number of models. Multi-model endpoints enable time-sharing of memory resources across models. It also reduces deployment overhead because Amazon SageMaker manages loading models in memory and scaling them based on traffic patterns to models.

Reducing labeling time with Amazon SageMaker Ground Truth

Data labeling is a key process of identifying raw data (such as images, text files, and videos) and adding one or more meaningful and informative labels to provide context so that an ML model can learn from it. This process is essential because the accuracy of trained model depends on accuracy of properly labeled dataset, or ground truth.

Amazon SageMaker Ground Truth uses combination of ML and a human workforce (vetted by AWS) to label images and text. Many ML projects are delayed because of insufficient labeled data. You can use Ground Truth to accelerate the ML cycle and reduce overall costs.

Tagging your resources

Consider tagging your Amazon SageMaker notebook instances and the hosting endpoints. Tags such as name of the project, business unit, environment (such as development, testing, or production) are useful for cost-optimization and can provide a clear visibility into where the money is spent. Cost allocation tags can help track and categorize your cost of ML. It can answer questions such as “Can I delete this resource to save cost?”

Keeping track of cost

If you need visibility of your ML cost on AWS, use AWS Budgets. This helps you track your Amazon SageMaker cost, including development, training, and hosting. You can also set alerts and get a notification when your cost or usage exceeds (or is forecasted to exceed) your budgeted amount. After you create your budget, you can track the progress on the AWS Budgets console.

Conclusion

In this post, I highlighted a few approaches and techniques to optimize cost without compromising on the implementation flexibility so you can deliver best-in-class ML-based business applications.

For more information about optimizing costs, consider the following:


About the Author

BK Chaurasiya is a Principal Product Manager at Amazon Web Services R&D and Innovation team. He provides technical guidance, design advice, and thought leadership to some of the largest and successful AWS customers and partners. A technologist by heart, BK specializes in driving DevOps, continuous delivery, and large-scale cloud transformation initiatives to success.

Source: https://aws.amazon.com/blogs/machine-learning/optimizing-costs-for-machine-learning-with-amazon-sagemaker/

Continue Reading

AI

Optimizing costs for machine learning with Amazon SageMaker

Applications based on machine learning (ML) can provide tremendous business value. Using ML, we can solve some of the most complex engineering problems that previously were infeasible. One of the advantages of running ML on the AWS Cloud is that you can continually optimize your workloads and reduce your costs. In this post, we discuss […]

Published

on

Applications based on machine learning (ML) can provide tremendous business value. Using ML, we can solve some of the most complex engineering problems that previously were infeasible. One of the advantages of running ML on the AWS Cloud is that you can continually optimize your workloads and reduce your costs. In this post, we discuss how to apply such optimization to ML workloads. We consider available options such as elasticity, different pricing models in cloud, automation, advantage of scale, and more.

Developing, training, maintaining, and performance tuning ML models is an iterative process that requires continuous improvement. Determining the optimum state in the model while going through the permutations and combinations of model parameters and data dependencies to adjust is just one leg of the journey. There is more to optimizing the cost of ML than just algorithm performance and model tuning. There is also some effort required to integrate developed models into applications and realize their benefits. Throughout this process, you can keep the cost down in numerous ways. Amazon SageMaker has made most of this journey smooth so developers and data scientists can spend most of their time focusing on what matters the most—delivering business value.

Amazon SageMaker notebook instances

An Amazon SageMaker notebook instance is an ML compute instance running the Jupyter Notebook app. This notebook instance comes with sample notebooks, several optimized algorithms, and complete code walkthroughs. Amazon SageMaker manages the creation of this instance and related resources. Consider using Amazon SageMaker Studio notebooks for collaborative workloads and when you don’t need to set up compute instances and file storage beforehand.

You can follow these best practices to help reduce the cost of notebook instances.

GPU or CPU?

CPUs are best at handling single, more complex calculations sequentially, whereas GPUs are better at handling multiple but simple calculations in parallel. For many use cases, a standard current generation instance type from an instance family such as ml.m* provides enough computing power, memory, and network performance for many Jupyter notebooks to perform well. GPUs provide a great price/performance ratio if you take advantage of them effectively. However, GPUs also cost more, and you should choose GPU-based notebooks only when you really need them.

Ask yourself: Is my neural network relatively small scale? Is my network performing tons of calculations involving hundreds of thousands of parameters? Can my model take advantage of hardware parallelism such as P3 and P3dn instance families?

Depending on the model, the GPU communication overhead might even degrade performance. So, take a step back and start with what you think is the minimum requirement in terms of ml instance specification and work your way up to identifying the best instance type and family for your model.

If you’re using your notebook instance to train multiple jobs, decide when you need a GPU-enabled instance and when you don’t. If you need accelerated computing in your notebook environment, you can stop your m* family notebook instance, switch to a GPU-enabled P* family instance, and start it again. Don’t forget to switch it back when you no longer need that extra boost in your development environment.

If you’re using massive datasets for training and don’t want to wait for days or weeks to finish your training job, you can speed up the process by distributing training on multiple machines or processes in a cluster.

It’s recommended to use a small subset of your data for development in your notebook instance. You can use the full dataset for a training job that is distributed across optimized instances such as P2 or P3 GPU instances or an instance with powerful CPU, such as c5.

Maximize instance utilization

You can optimize your Amazon SageMaker notebook utilization many different ways. One simple way is to stop your notebook instance when you’re not using it and start when you need it. Consider auto-detecting idle notebook instances and managing their lifecycle using a lifecycle configuration script. For detailed implementation, see Right-sizing resources and avoiding unnecessary costs in Amazon SageMaker. Remember that the instance is only useful when you’re using the Jupyter notebook. If you’re not working on a notebook overnight or over the weekend, it’s a good idea to schedule a stop and start. Another way to save instance cost is by scheduling an AWS Lambda function. For example, you can stop all instances at 7:00 PM and start them at 7:00 AM.

You can also use Amazon CloudWatch Events to start and stop the instance based on an event. If you’re feeling geeky, connect it to your Amazon Rekognition based system to start a data scientist’s notebook instance when they step into the office or have Amazon Alexa do it as you grab a coffee.

Training jobs

The following are some best practices for saving costs on training jobs.

Use pre-trained models or even APIs

Pre-trained models eliminate the time spent gathering data and training models with that data. Consider using higher-level APIs such as provided by Amazon Rekognition or Amazon Comprehend to help you avoid spending on tasks that are already done for you. As an example, Amazon Comprehend simplifies topic modeling on a large corpus of documents. You can also use the Neural topic modeling (NTM) algorithm in Amazon SageMaker to get similar results with more effort. Although you have more control over hyperparameters when training your own model, your use case may not need it. A lot of engineering work and experience goes into creating ready-to-consume and highly optimized models, therefore an upfront ROI analysis is highly recommended if you’re embarking on a journey to develop similar models.

Use Pipe mode (where applicable) to reduce training time

Certain algorithms in Amazon SageMaker like Blazing text work on a large corpus of data. When these jobs are launched, significant time goes into downloading the data from Amazon Simple Storage Service (Amazon S3) into the local Amazon Elastic Block Storage (Amazon EBS) store. Your training jobs don’t start until this download finishes. These algorithms can take advantage of Pipe mode, in which training data is streamed from Amazon S3 into Amazon EBS and your training jobs start immediately. For example, training Blazing text on common crawl (3 TB) can take a few days, out of which a significant number of hours are just lost in download. This process can take advantage of Pipe mode to reduce significant training time.

Managed spot training in Amazon SageMaker

Managed spot training can optimize the cost of training models up to 90% over On-Demand Instances. Amazon SageMaker manages the Spot interruptions on your behalf. If your training job can be interrupted, use managed spot training. You can specify which training jobs use Spot Instances and a stopping condition that specifies how long Amazon SageMaker waits for a job to run using EC2 Spot Instances.

You may also consider using EC2 Spot Instances if you’re willing to do some extra work and if your algorithm is resilient enough to interruptions. For more information, see Managed Spot Training: Save Up to 90% On Your Amazon SageMaker Training Jobs.

Test your code locally

Resolve issues with code and data so you don’t need to pay to run training clusters for failed training jobs. This also saves you time spent initializing the training cluster. Before you submit a training job, try to run the fit function in local mode to fetch some early feedback:

mxnet_estimator = MXNet('train.py', train_instance_type='local', train_instance_count=1)

Monitor the performance of your training jobs to identify waste

Amazon SageMaker is integrated with CloudWatch out of the box and publishes instance metrics of the training cluster in CloudWatch. You can use these metrics to see if you should make adjustments to your cluster, such as CPUs, memory, number of instances, and more. To view the CloudWatch metric for your training jobs, navigate to the Jobs page on the Amazon SageMaker console and choose View Instance metrics in the Monitor section.

Also, use Amazon SageMaker Debugger, which provides full visibility into model training by monitoring, recording, analyzing, and visualizing training process tensors. Debugger can dramatically reduce the time, resources, and cost needed to train models.

Find the right balance: Performance vs. accuracy

Compare the throughput of 16-bit floating point and 32-bit floating point calculations and determine what is right for your model. 32-bit (single precision or FP32) and even 64-bit (double precision or FP64) floating point variables are popular for many applications that require high precision. These are workloads like engineering simulations that simulate real-world behavior and need the mathematical model to be as exact as possible. In many cases, however, reducing memory usage and increasing speed gained by moving to half or mixed precision (16-bit or FP16) is worth the minor tradeoffs in accuracy. For more information, see Accelerating GPU computation through mixed-precision methods.

A similar trade-off also applies when deciding on the number of layers in your neural network for your classification algorithms, such as image classification.

Tuning (hyperparameter optimization) jobs

Use hyperparameter optimization (HPO) when needed and choose the hyperparameters and their ranges to tune on wisely.

Some API calls can result in a bill of hundreds or even thousands of dollars, and tuning jobs are one of those. A good tuning job can save you many working days of expensive data scientists’ time and provide a significant lift in model performance, which is highly beneficial. HPO in Amazon SageMaker finds good hyperparameters quicker if the search space is narrow (for example, a learning rate of 0.01–0.05 rather than 0.001–0.9). If you have some relevant prior knowledge about the hyperparameter range, start with that. For wide hyperparameter ranges, you may want to consider logarithmic transformations.

Amazon SageMaker also reduces the amount of time spent tuning models using built-in HPO. This technology automatically adjusts hundreds of different combinations of parameters to quickly arrive at the best solution for your ML problem. With high-performance algorithms, distributed computing, managed infrastructure, and HPO, Amazon SageMaker drastically decreases the training time and overall cost of building production grade systems. You can see examples of HPO in some of the Amazon SageMaker built-in algorithms.

For longer training jobs and as the training time for each training job gets longer, you may also want to consider early stopping of training jobs.

Hosting endpoints

The following section discusses how to save cost when hosting endpoints using Amazon SageMaker hosting services.

Delete endpoints that aren’t in use

Amazon SageMaker is great for testing new models because you can easily deploy them into an A/B testing environment. When you’re done with your tests and not using the endpoint extensively anymore, you should delete it. You can always recreate it when you need it again because the model is stored in Amazon S3.

Use Automatic Scaling

Auto Scaling your Amazon SageMaker endpoint doesn’t just provide high availability, better throughput, and better performance, it also optimizes the cost of your endpoint. Make sure that you configure Auto Scaling for your endpoint, monitor your model endpoint, and adjust the scaling policy based on the CloudWatch metrics. For more information, see Load test and optimize and Amazon SageMaker endpoint using automatic scaling.

Amazon Elastic Inference for deep learning

Selecting a GPU instance type that is big enough to satisfy the requirements of the most demanding resource for inference may not be a smart move. Even at peak load, a deep learning application may not fully utilize the capacity offered by a GPU. Consider using Amazon Elastic Inference, which allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Amazon SageMaker instances to reduce the cost of running deep learning inference by up to 75%.

Host multiple models with multi-model endpoints

You can create an endpoint that can host multiple models. Multi-model endpoints reduce hosting costs by improving endpoint utilization and provide a scalable and cost-effective solution to deploying a large number of models. Multi-model endpoints enable time-sharing of memory resources across models. It also reduces deployment overhead because Amazon SageMaker manages loading models in memory and scaling them based on traffic patterns to models.

Reducing labeling time with Amazon SageMaker Ground Truth

Data labeling is a key process of identifying raw data (such as images, text files, and videos) and adding one or more meaningful and informative labels to provide context so that an ML model can learn from it. This process is essential because the accuracy of trained model depends on accuracy of properly labeled dataset, or ground truth.

Amazon SageMaker Ground Truth uses combination of ML and a human workforce (vetted by AWS) to label images and text. Many ML projects are delayed because of insufficient labeled data. You can use Ground Truth to accelerate the ML cycle and reduce overall costs.

Tagging your resources

Consider tagging your Amazon SageMaker notebook instances and the hosting endpoints. Tags such as name of the project, business unit, environment (such as development, testing, or production) are useful for cost-optimization and can provide a clear visibility into where the money is spent. Cost allocation tags can help track and categorize your cost of ML. It can answer questions such as “Can I delete this resource to save cost?”

Keeping track of cost

If you need visibility of your ML cost on AWS, use AWS Budgets. This helps you track your Amazon SageMaker cost, including development, training, and hosting. You can also set alerts and get a notification when your cost or usage exceeds (or is forecasted to exceed) your budgeted amount. After you create your budget, you can track the progress on the AWS Budgets console.

Conclusion

In this post, I highlighted a few approaches and techniques to optimize cost without compromising on the implementation flexibility so you can deliver best-in-class ML-based business applications.

For more information about optimizing costs, consider the following:


About the Author

BK Chaurasiya is a Principal Product Manager at Amazon Web Services R&D and Innovation team. He provides technical guidance, design advice, and thought leadership to some of the largest and successful AWS customers and partners. A technologist by heart, BK specializes in driving DevOps, continuous delivery, and large-scale cloud transformation initiatives to success.

Source: https://aws.amazon.com/blogs/machine-learning/optimizing-costs-for-machine-learning-with-amazon-sagemaker/

Continue Reading

AI

Optimizing costs for machine learning with Amazon SageMaker

Applications based on machine learning (ML) can provide tremendous business value. Using ML, we can solve some of the most complex engineering problems that previously were infeasible. One of the advantages of running ML on the AWS Cloud is that you can continually optimize your workloads and reduce your costs. In this post, we discuss […]

Published

on

Applications based on machine learning (ML) can provide tremendous business value. Using ML, we can solve some of the most complex engineering problems that previously were infeasible. One of the advantages of running ML on the AWS Cloud is that you can continually optimize your workloads and reduce your costs. In this post, we discuss how to apply such optimization to ML workloads. We consider available options such as elasticity, different pricing models in cloud, automation, advantage of scale, and more.

Developing, training, maintaining, and performance tuning ML models is an iterative process that requires continuous improvement. Determining the optimum state in the model while going through the permutations and combinations of model parameters and data dependencies to adjust is just one leg of the journey. There is more to optimizing the cost of ML than just algorithm performance and model tuning. There is also some effort required to integrate developed models into applications and realize their benefits. Throughout this process, you can keep the cost down in numerous ways. Amazon SageMaker has made most of this journey smooth so developers and data scientists can spend most of their time focusing on what matters the most—delivering business value.

Amazon SageMaker notebook instances

An Amazon SageMaker notebook instance is an ML compute instance running the Jupyter Notebook app. This notebook instance comes with sample notebooks, several optimized algorithms, and complete code walkthroughs. Amazon SageMaker manages the creation of this instance and related resources. Consider using Amazon SageMaker Studio notebooks for collaborative workloads and when you don’t need to set up compute instances and file storage beforehand.

You can follow these best practices to help reduce the cost of notebook instances.

GPU or CPU?

CPUs are best at handling single, more complex calculations sequentially, whereas GPUs are better at handling multiple but simple calculations in parallel. For many use cases, a standard current generation instance type from an instance family such as ml.m* provides enough computing power, memory, and network performance for many Jupyter notebooks to perform well. GPUs provide a great price/performance ratio if you take advantage of them effectively. However, GPUs also cost more, and you should choose GPU-based notebooks only when you really need them.

Ask yourself: Is my neural network relatively small scale? Is my network performing tons of calculations involving hundreds of thousands of parameters? Can my model take advantage of hardware parallelism such as P3 and P3dn instance families?

Depending on the model, the GPU communication overhead might even degrade performance. So, take a step back and start with what you think is the minimum requirement in terms of ml instance specification and work your way up to identifying the best instance type and family for your model.

If you’re using your notebook instance to train multiple jobs, decide when you need a GPU-enabled instance and when you don’t. If you need accelerated computing in your notebook environment, you can stop your m* family notebook instance, switch to a GPU-enabled P* family instance, and start it again. Don’t forget to switch it back when you no longer need that extra boost in your development environment.

If you’re using massive datasets for training and don’t want to wait for days or weeks to finish your training job, you can speed up the process by distributing training on multiple machines or processes in a cluster.

It’s recommended to use a small subset of your data for development in your notebook instance. You can use the full dataset for a training job that is distributed across optimized instances such as P2 or P3 GPU instances or an instance with powerful CPU, such as c5.

Maximize instance utilization

You can optimize your Amazon SageMaker notebook utilization many different ways. One simple way is to stop your notebook instance when you’re not using it and start when you need it. Consider auto-detecting idle notebook instances and managing their lifecycle using a lifecycle configuration script. For detailed implementation, see Right-sizing resources and avoiding unnecessary costs in Amazon SageMaker. Remember that the instance is only useful when you’re using the Jupyter notebook. If you’re not working on a notebook overnight or over the weekend, it’s a good idea to schedule a stop and start. Another way to save instance cost is by scheduling an AWS Lambda function. For example, you can stop all instances at 7:00 PM and start them at 7:00 AM.

You can also use Amazon CloudWatch Events to start and stop the instance based on an event. If you’re feeling geeky, connect it to your Amazon Rekognition based system to start a data scientist’s notebook instance when they step into the office or have Amazon Alexa do it as you grab a coffee.

Training jobs

The following are some best practices for saving costs on training jobs.

Use pre-trained models or even APIs

Pre-trained models eliminate the time spent gathering data and training models with that data. Consider using higher-level APIs such as provided by Amazon Rekognition or Amazon Comprehend to help you avoid spending on tasks that are already done for you. As an example, Amazon Comprehend simplifies topic modeling on a large corpus of documents. You can also use the Neural topic modeling (NTM) algorithm in Amazon SageMaker to get similar results with more effort. Although you have more control over hyperparameters when training your own model, your use case may not need it. A lot of engineering work and experience goes into creating ready-to-consume and highly optimized models, therefore an upfront ROI analysis is highly recommended if you’re embarking on a journey to develop similar models.

Use Pipe mode (where applicable) to reduce training time

Certain algorithms in Amazon SageMaker like Blazing text work on a large corpus of data. When these jobs are launched, significant time goes into downloading the data from Amazon Simple Storage Service (Amazon S3) into the local Amazon Elastic Block Storage (Amazon EBS) store. Your training jobs don’t start until this download finishes. These algorithms can take advantage of Pipe mode, in which training data is streamed from Amazon S3 into Amazon EBS and your training jobs start immediately. For example, training Blazing text on common crawl (3 TB) can take a few days, out of which a significant number of hours are just lost in download. This process can take advantage of Pipe mode to reduce significant training time.

Managed spot training in Amazon SageMaker

Managed spot training can optimize the cost of training models up to 90% over On-Demand Instances. Amazon SageMaker manages the Spot interruptions on your behalf. If your training job can be interrupted, use managed spot training. You can specify which training jobs use Spot Instances and a stopping condition that specifies how long Amazon SageMaker waits for a job to run using EC2 Spot Instances.

You may also consider using EC2 Spot Instances if you’re willing to do some extra work and if your algorithm is resilient enough to interruptions. For more information, see Managed Spot Training: Save Up to 90% On Your Amazon SageMaker Training Jobs.

Test your code locally

Resolve issues with code and data so you don’t need to pay to run training clusters for failed training jobs. This also saves you time spent initializing the training cluster. Before you submit a training job, try to run the fit function in local mode to fetch some early feedback:

mxnet_estimator = MXNet('train.py', train_instance_type='local', train_instance_count=1)

Monitor the performance of your training jobs to identify waste

Amazon SageMaker is integrated with CloudWatch out of the box and publishes instance metrics of the training cluster in CloudWatch. You can use these metrics to see if you should make adjustments to your cluster, such as CPUs, memory, number of instances, and more. To view the CloudWatch metric for your training jobs, navigate to the Jobs page on the Amazon SageMaker console and choose View Instance metrics in the Monitor section.

Also, use Amazon SageMaker Debugger, which provides full visibility into model training by monitoring, recording, analyzing, and visualizing training process tensors. Debugger can dramatically reduce the time, resources, and cost needed to train models.

Find the right balance: Performance vs. accuracy

Compare the throughput of 16-bit floating point and 32-bit floating point calculations and determine what is right for your model. 32-bit (single precision or FP32) and even 64-bit (double precision or FP64) floating point variables are popular for many applications that require high precision. These are workloads like engineering simulations that simulate real-world behavior and need the mathematical model to be as exact as possible. In many cases, however, reducing memory usage and increasing speed gained by moving to half or mixed precision (16-bit or FP16) is worth the minor tradeoffs in accuracy. For more information, see Accelerating GPU computation through mixed-precision methods.

A similar trade-off also applies when deciding on the number of layers in your neural network for your classification algorithms, such as image classification.

Tuning (hyperparameter optimization) jobs

Use hyperparameter optimization (HPO) when needed and choose the hyperparameters and their ranges to tune on wisely.

Some API calls can result in a bill of hundreds or even thousands of dollars, and tuning jobs are one of those. A good tuning job can save you many working days of expensive data scientists’ time and provide a significant lift in model performance, which is highly beneficial. HPO in Amazon SageMaker finds good hyperparameters quicker if the search space is narrow (for example, a learning rate of 0.01–0.05 rather than 0.001–0.9). If you have some relevant prior knowledge about the hyperparameter range, start with that. For wide hyperparameter ranges, you may want to consider logarithmic transformations.

Amazon SageMaker also reduces the amount of time spent tuning models using built-in HPO. This technology automatically adjusts hundreds of different combinations of parameters to quickly arrive at the best solution for your ML problem. With high-performance algorithms, distributed computing, managed infrastructure, and HPO, Amazon SageMaker drastically decreases the training time and overall cost of building production grade systems. You can see examples of HPO in some of the Amazon SageMaker built-in algorithms.

For longer training jobs and as the training time for each training job gets longer, you may also want to consider early stopping of training jobs.

Hosting endpoints

The following section discusses how to save cost when hosting endpoints using Amazon SageMaker hosting services.

Delete endpoints that aren’t in use

Amazon SageMaker is great for testing new models because you can easily deploy them into an A/B testing environment. When you’re done with your tests and not using the endpoint extensively anymore, you should delete it. You can always recreate it when you need it again because the model is stored in Amazon S3.

Use Automatic Scaling

Auto Scaling your Amazon SageMaker endpoint doesn’t just provide high availability, better throughput, and better performance, it also optimizes the cost of your endpoint. Make sure that you configure Auto Scaling for your endpoint, monitor your model endpoint, and adjust the scaling policy based on the CloudWatch metrics. For more information, see Load test and optimize and Amazon SageMaker endpoint using automatic scaling.

Amazon Elastic Inference for deep learning

Selecting a GPU instance type that is big enough to satisfy the requirements of the most demanding resource for inference may not be a smart move. Even at peak load, a deep learning application may not fully utilize the capacity offered by a GPU. Consider using Amazon Elastic Inference, which allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Amazon SageMaker instances to reduce the cost of running deep learning inference by up to 75%.

Host multiple models with multi-model endpoints

You can create an endpoint that can host multiple models. Multi-model endpoints reduce hosting costs by improving endpoint utilization and provide a scalable and cost-effective solution to deploying a large number of models. Multi-model endpoints enable time-sharing of memory resources across models. It also reduces deployment overhead because Amazon SageMaker manages loading models in memory and scaling them based on traffic patterns to models.

Reducing labeling time with Amazon SageMaker Ground Truth

Data labeling is a key process of identifying raw data (such as images, text files, and videos) and adding one or more meaningful and informative labels to provide context so that an ML model can learn from it. This process is essential because the accuracy of trained model depends on accuracy of properly labeled dataset, or ground truth.

Amazon SageMaker Ground Truth uses combination of ML and a human workforce (vetted by AWS) to label images and text. Many ML projects are delayed because of insufficient labeled data. You can use Ground Truth to accelerate the ML cycle and reduce overall costs.

Tagging your resources

Consider tagging your Amazon SageMaker notebook instances and the hosting endpoints. Tags such as name of the project, business unit, environment (such as development, testing, or production) are useful for cost-optimization and can provide a clear visibility into where the money is spent. Cost allocation tags can help track and categorize your cost of ML. It can answer questions such as “Can I delete this resource to save cost?”

Keeping track of cost

If you need visibility of your ML cost on AWS, use AWS Budgets. This helps you track your Amazon SageMaker cost, including development, training, and hosting. You can also set alerts and get a notification when your cost or usage exceeds (or is forecasted to exceed) your budgeted amount. After you create your budget, you can track the progress on the AWS Budgets console.

Conclusion

In this post, I highlighted a few approaches and techniques to optimize cost without compromising on the implementation flexibility so you can deliver best-in-class ML-based business applications.

For more information about optimizing costs, consider the following:


About the Author

BK Chaurasiya is a Principal Product Manager at Amazon Web Services R&D and Innovation team. He provides technical guidance, design advice, and thought leadership to some of the largest and successful AWS customers and partners. A technologist by heart, BK specializes in driving DevOps, continuous delivery, and large-scale cloud transformation initiatives to success.

Source: https://aws.amazon.com/blogs/machine-learning/optimizing-costs-for-machine-learning-with-amazon-sagemaker/

Continue Reading

Trending