Connect with us


AI and Efficiency

We’re releasing an analysis showing that since 2012 the amount of compute needed to train a neural net to the same performance on ImageNet classification has been decreasing by a factor of 2 every 16 months. Compared to 2012, it now takes 44 times less compute to train a



We’re releasing an analysis showing that since 2012 the amount of compute needed to train a neural net to the same performance on ImageNet classification has been decreasing by a factor of 2 every 16 months. Compared to 2012, it now takes 44 times less compute to train a neural network to the level of AlexNet (by contrast, Moore’s Law would yield an 11x cost improvement over this period). Our results suggest that for AI tasks with high levels of recent investment, algorithmic progress has yielded more gains than classical hardware efficiency.

Read Paper

Algorithmic improvement is a key factor driving the advance of AI. It’s important to search for measures that shed light on overall algorithmic progress, even though it’s harder than measuring such trends in compute.

44x less compute required to get to AlexNet performance 7 years later

Total amount of compute in teraflops/s-days used to train to AlexNet level performance. Lowest compute points at any given time shown in blue, all points measured shown in gray.

Download charts

Measuring efficiency

Algorithmic efficiency can be defined as reducing the compute needed to train a specific capability. Efficiency is the primary way we measure algorithmic progress on classic computer science problems like sorting. Efficiency gains on traditional problems like sorting are more straightforward to measure than in ML because they have a clearer measure of task difficulty. However, we can apply the efficiency lens to machine learning by holding performance constant. Efficiency trends can be compared across domains like DNA sequencing (10-month doubling), solar energy (6-year doubling), and transistor density (2-year doubling).

For our analysis, we primarily leveraged open-source re-implementations to measure progress on AlexNet level performance over a long horizon. We saw a similar rate of training efficiency improvement for ResNet-50 level performance on ImageNet (17-month doubling time). We saw faster rates of improvement over shorter timescales in Translation, Go, and Dota 2:

  1. Within translation, the Transformer surpassed seq2seq performance on English to French translation on WMT’14 with 61x less training compute 3 years later.
  2. We estimate AlphaZero took 8x less compute to get to AlphaGoZero level performance 1 year later.
  3. OpenAI Five Rerun required 5x less training compute to surpass OpenAI Five (which beat the world champions, OG) 3 months later.

It can be helpful to think of compute in 2012 not being equal to compute in 2019 in a similar way that dollars need to be inflation-adjusted over time. A fixed amount of compute could accomplish more in 2019 than in 2012. One way to think about this is that some types of AI research progress in two stages, similar to the “tick tock” model of development seen in semiconductors; new capabilities (the “tick”) typically require a significant amount of compute expenditure to obtain, then refined versions of those capabilities (the “tock”) become much more efficient to deploy due to process improvements.

Increases in algorithmic efficiency allow researchers to do more experiments of interest in a given amount of time and money. In addition to being a measure of overall progress, algorithmic efficiency gains speed up future AI research in a way that’s somewhat analogous to having more compute.

Other measures of AI progress

In addition to efficiency, many other measures shed light on overall algorithmic progress in AI. Training cost in dollars is related, but less narrowly focused on algorithmic progress because it’s also affected by improvement in the underlying hardware, hardware utilization, and cloud infrastructure. Sample efficiency is key when we’re in a low data regime, which is the case for many tasks of interest. The ability to train models faster also speeds up research and can be thought of as a measure of the parallelizability of learning capabilities of interest. We also find increases in inference efficiency in terms of GPU time, parameters, and flops meaningful, but mostly as a result of their economic implications rather than their effect on future research progress. Shufflenet achieved AlexNet-level performance with an 18x inference efficiency increase in 5 years (15-month doubling time), which suggests that training efficiency and inference efficiency might improve at similar rates. The creation of datasets/​environments/​benchmarks is a powerful method of making specific AI capabilities of interest more measurable.

Primary limitations

  1. We have only a small number of algorithmic efficiency data points on a few tasks. It’s unclear the degree to which the efficiency trends we’ve observed generalize to other AI tasks. Systematic measurement could make it clear whether an algorithmic equivalent to Moore’s Law in the domain of AI exists, and if it exists, clarify its nature. We consider this a highly interesting open question. We suspect we’re more likely to observe similar rates of efficiency progress on similar tasks. By similar tasks, we mean tasks within these sub-domains of AI, on which the field agrees we’ve seen substantial progress, and that have comparable levels of investment (compute and/or researcher time).
  2. Even though we believe AlexNet represented a lot of progress, this analysis doesn’t attempt to quantify that progress. More generally, the first time a capability is created, algorithmic breakthroughs may have reduced the resources required from totally infeasible to merely high. We think new capabilities generally represent a larger share of overall conceptual progress than observed efficiency increases of the type shown here.
  3. This analysis focuses on the final training run cost for an optimized model rather than total development costs. Some algorithmic improvements make it easier to train a model by making the space of hyperparameters that will train stably and get good final performance much larger. On the other hand, architecture searches increase the gap between the final training run cost and total training costs.
  4. We don’t speculate on the degree to which we expect efficiency trends will extrapolate in time, we merely present our results and discuss the implications if the trends persist.

Measurement and AI policy

We believe that policymaking related to AI will be improved by a greater focus on the measurement and assessment of AI systems, both in terms of technical attributes and societal impact. We think such measurement initiatives can shed light on important questions in policy; our AI and Compute analysis suggests policymakers should increase funding for compute resources for academia, so that academic research can replicate, reproduce, and extend industry research. This efficiency analysis suggests that policymakers could develop accurate intuitions about the cost of deploying AI capabilities—and how these costs are going to alter over time—by more closely assessing the rate of improvements in efficiency for AI systems.

Tracking efficiency going forward

If large scale compute continues to be important to achieving state of the art (SOTA) overall performance in domains like language and games then it’s important to put effort into measuring notable progress achieved with smaller amounts of compute (contributions often made by academic institutions). Models that achieve training efficiency state of the arts on meaningful capabilities are promising candidates for scaling up and potentially achieving overall top performance. Additionally, figuring out the algorithmic efficiency improvements are straightforward since they are just a particularly meaningful slice of the learning curves that all experiments generate.

We also think that measuring long run trends in efficiency SOTAs will help paint a quantitative picture of overall algorithmic progress. We observe that hardware and algorithmic efficiency gains are multiplicative and can be on a similar scale over meaningful horizons, which suggests that a good model of AI progress should integrate measures from both.

Our results suggest that for AI tasks with high levels of investment (researcher time and/or compute) algorithmic efficiency might outpace gains from hardware efficiency (Moore’s Law). Moore’s Law was coined in 1965 when integrated circuits had a mere 64 transistors (6 doublings) and naively extrapolating it out predicted personal computers and smartphones (an iPhone 11 has 8.5 billion transistors). If we observe decades of exponential improvement in the algorithmic efficiency of AI, what might it lead to? We’re not sure. That these results make us ask this question is a modest update for us towards a future with powerful AI services and technology.

For all these reasons, we’re going to start tracking efficiency SOTAs publicly. We’ll start with vision and translation efficiency benchmarks (ImageNet and WMT14), and we’ll consider adding more benchmarks over time. We believe there are efficiency SOTAs on these benchmarks we’re unaware of and encourage the research community to submit them here (we’ll give credit to original authors and collaborators).

Industry leaders, policymakers, economists, and potential researchers are all trying to better understand AI progress and decide how much attention they should invest and where to direct it. Measurement efforts can help ground such decisions. If you’re interested in this type of work, consider applying to work at OpenAI’s Foresight or Policy team!



Bringing real-time machine learning-powered insights to rugby using Amazon SageMaker

The Guinness Six Nations Championship began in 1883 as the Home Nations Championship among England, Ireland, Scotland, and Wales, with the inclusion of France in 1910 and Italy in 2000. It is among the oldest surviving rugby traditions and one of the best-attended sporting events in the world. The COVID-19 outbreak disrupted the end of […]



The Guinness Six Nations Championship began in 1883 as the Home Nations Championship among England, Ireland, Scotland, and Wales, with the inclusion of France in 1910 and Italy in 2000. It is among the oldest surviving rugby traditions and one of the best-attended sporting events in the world. The COVID-19 outbreak disrupted the end of the 2020 Championship and four games were postponed. The remaining rounds resumed on October 24. With the increasing application of artificial intelligence and machine learning (ML) in sports analytics, AWS and Stats Perform partnered to bring ML-powered, real-time stats to the game of rugby, to enhance fan engagement and provide valuable insights into the game.

This post summarizes the collaborative effort between the Guinness Six Nations Rugby Championship, Stats Perform, and AWS to develop an ML-driven approach with Amazon SageMaker and other AWS services that predicts the probability of a successful penalty kick, computed in real time and broadcast live during the game. AWS infrastructure enables single-digit millisecond latency for kick predictions during inference. The Kick Predictor stat is one of the many new AWS-powered, on-screen dynamic Matchstats that provide fans with a greater understanding of key in-game events, including scrum analysis, play patterns, rucks and tackles, and power game analysis. For more information about other stats developed for rugby using AWS services, see the Six Nations Rugby website.

Rugby is a form of football with a 23-player match day squad. 15 players on each team are on the field, with additional substitutions waiting to get involved in the full-contact sport. The objective of the game is to outscore the opposing team, and one way of scoring is to kick a goal. The ability to kick accurately is one of the most critical elements of rugby, and there are two ways to score with a kick: through a conversion (worth two points) and a penalty (worth three points).

Predicting the likelihood of a successful kick is important because it enhances fan engagement during the game by showing the success probability before the player kicks the ball. There are usually 40–60 seconds of stoppage time while the player sets up for the kick, during which the Kick Predictor stat can appear on-screen to fans. Commentators also have time to predict the outcome, quantify the difficulty of each kick, and compare kickers in similar situations. Moreover, teams may start to use kicking probability models in the future to determine which player should kick given the position of the penalty on the pitch.

Developing an ML solution

To calculate the penalty success probability, the Amazon Machine Learning Solutions Lab used Amazon SageMaker to train, test, and deploy an ML model from historical in-game events data, which calculates the kick predictions from anywhere in the field. The following sections explain the dataset and preprocessing steps, the model training, and model deployment procedures.

Dataset and preprocessing

Stats Perform provided the dataset for training the goal kick model. It contained millions of events from historical rugby matches from 46 leagues from 2007–2019. The raw JSON events data that was collected during live rugby matches was ingested and stored on Amazon Simple Storage Service (Amazon S3). It was then parsed and preprocessed in an Amazon SageMaker notebook instance. After selecting the kick-related events, the training data comprised approximately 67,000 kicks, with approximately 50,000 (75%) successful kicks and 17,000 misses (25%).

The following graph shows a summary of kicks taken during a sample game. The athletes kicked from different angles and various distances.

Rugby experts contributed valuable insights to the data preprocessing, which included detecting and removing anomalies, such as unreasonable kicks. The clean CSV data went back to an S3 bucket for ML training.

The following graph depicts the heatmap of the kicks after preprocessing. The left-side kicks are mirrored. The brighter colors indicated a higher chance of scoring, standardized between 0 to 1.

Feature engineering

To better capture the real-world event, the ML Solutions Lab engineered several features using exploratory data analysis and insights from rugby experts. The features that went into the modeling fell into three main categories:

  • Location-based features – The zone in which the athlete takes the kick and the distance and angle of the kick to the goal. The x-coordinates of the kicks are mirrored along the center of the rugby pitch to eliminate the left or right bias in the model.
  • Player performance features – The mean success rates of the kicker in a given field zone, in the Championship, and in the kicker’s entire career.
  • In-game situational features – The kicker’s team (home or away), the scoring situation before they take the kick, and the period of the game in which they take the kick.

The location-based and player performance features are the most important features in the model.

After feature engineering, the categorical variables were one-hot encoded, and to avoid the bias of the model towards large-value variables, the numerical predictors were standardized. During the model training phase, a player’s historical performance features were pushed to Amazon DynamoDB tables. DynamoDB helped provide single-digit millisecond latency for kick predictions during inference.

Training and deploying models

To explore a wide range of classification algorithms (such as logistic regression, random forests, XGBoost, and neural networks), a 10-fold stratified cross-validation approach was used for model training. After exploring different algorithms, the built-in XGBoost in Amazon SageMaker was used due to its better prediction performance and inference speed. Additionally, its implementation has a smaller memory footprint, better logging, and improved hyperparameter optimization (HPO) compared to the original code base.

HPO, or tuning, is the process of choosing a set of optimal hyperparameters for a learning algorithm, and is a challenging element in any ML problem. HPO in Amazon SageMaker uses an implementation of Bayesian optimization to choose the best hyperparameters for the next training job. Amazon SageMaker HPO automatically launches multiple training jobs with different hyperparameter settings, evaluates the results of those training jobs based on a predefined objective metric, and selects improved hyperparameter settings for future attempts based on previous results.

The following diagram illustrates the model training workflow.

Optimizing hyperparameters in Amazon SageMaker

You can configure training jobs and when the hyperparameter tuning job launches by initializing an estimator, which includes the container image for the algorithm (for this use case, XGBoost), configuration for the output of the training jobs, the values of static algorithm hyperparameters, and the type and number of instances to use for the training jobs. For more information, see Train a Model.

To create the XGBoost estimator for this use case, enter the following code:

import boto3
import sagemaker
from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner
from import get_image_uri
BUCKET = <bucket name>
PREFIX = 'kicker/xgboost/'
region = boto3.Session().region_name
role = sagemaker.get_execution_role()
smclient = boto3.Session().client('sagemaker')
sess = sagemaker.Session()
s3_output_path = ‘s3://{}/{}/output’.format(BUCKET, PREFIX) container = get_image_uri(region, 'xgboost', repo_version='0.90-1') xgb = sagemaker.estimator.Estimator(container, role, train_instance_count=4, train_instance_type= 'ml.m4.xlarge', output_path=s3_output_path, sagemaker_session=sess)

After you create the XGBoost estimator object, set its initial hyperparameter values as shown in the following code:

xgb.set_hyperparameters(eval_metric='auc', objective= 'binary:logistic', num_round=200, rate_drop=0.3, max_depth=5, subsample=0.8, gamma=2, eta=0.2, scale_pos_weight=2.85) #For class imbalance weights # Specifying the objective metric (auc on validation set)
OBJECTIVE_METRIC_NAME = ‘validation:auc’ # specifying the hyper parameters and their ranges
HYPERPARAMETER_RANGES = {'eta': ContinuousParameter(0, 1), 'alpha': ContinuousParameter(0, 2), 'max_depth': IntegerParameter(1, 10)}

For this post, AUC (area under the ROC curve) is the evaluation metric. This enables the tuning job to measure the performance of the different training jobs. The kick prediction is also a binary classification problem, which is specified in the objective argument as a binary:logistic. There is also a set of XGBoost-specific hyperparameters that you can tune. For more information, see Tune an XGBoost model.

Next, create a HyperparameterTuner object by indicating the XGBoost estimator, the hyperparameter ranges, passing the parameters, the objective metric name and definition, and tuning resource configurations, such as the number of training jobs to run in total and how many training jobs can run in parallel. Amazon SageMaker extracts the metric from Amazon CloudWatch Logs with a regular expression. See the following code:

tuner = HyperparameterTuner(xgb, OBJECTIVE_METRIC_NAME, HYPERPARAMETER_RANGES, max_jobs=20, max_parallel_jobs=4)
s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train'.format(BUCKET, PREFIX), content_type='csv')
s3_input_validation = sagemaker.s3_input(s3_data='s3://{}/{}/validation/'.format(BUCKET, PREFIX), content_type='csv'){'train': s3_input_train, 'validation':

Finally, launch a hyperparameter tuning job by calling the fit() function. This function takes the paths of the training and validation datasets in the S3 bucket. After you create the hyperparameter tuning job, you can track its progress via the Amazon SageMaker console. The training time depends on the instance type and number of instances you selected during tuning setup.

Deploying the model on Amazon SageMaker

When the training jobs are complete, you can deploy the best performing model. If you’d like to compare models for A/B testing, Amazon SageMaker supports hosting representational state transfer (REST) endpoints for multiple models. To set this up, create an endpoint configuration that describes the distribution of traffic across the models. In addition, the endpoint configuration describes the instance type required for model deployment. The first step is to get the name of the best performing training job and create the model name.

After you create the endpoint configuration, you’re ready to deploy the actual endpoint for serving inference requests. The result is an endpoint that can you can validate and incorporate into production applications. For more information about deploying models, see Deploy the Model to Amazon SageMaker Hosting Services. To create the endpoint configuration and deploy it, enter the following code:

endpoint_name = 'Kicker-XGBoostEndpoint'
xgb_predictor = tuner.deploy(initial_instance_count=1, instance_type='ml.t2.medium', endpoint_name=endpoint_name)

After you create the endpoint, you can request a prediction in real time.

Building a RESTful API for real-time model inference

You can create a secure and scalable RESTful API that enables you to request the model prediction based on the input values. It’s easy and convenient to develop different APIs using AWS services.

The following diagram illustrates the model inference workflow.

First, you request the probability of the kick conversion by passing parameters through Amazon API Gateway, such as the location and zone of the kick, kicker ID, league and Championship ID, the game’s period, if the kicker’s team is playing home or away, and the team score status.

The API Gateway passes the values to the AWS Lambda function, which parses the values and requests additional features related to the player’s performance from DynamoDB lookup tables. These include the mean success rates of the kicking player in a given field zone, in the Championship, and in the kicker’s entire career. If the player doesn’t exist in the database, the model uses the average performance in the database in the given kicking location. After the function combines all the values, it standardizes the data and sends it to the Amazon SageMaker model endpoint for prediction.

The model performs the prediction and returns the predicted probability to the Lambda function. The function parses the returned value and sends it back to API Gateway. API Gateway responds with the output prediction. The end-to-end process latency is less than a second.

The following screenshot shows example input and output of the API. The RESTful API also outputs the average success rate of all the players in the given location and zone to get the comparison of the player’s performance with the overall average.

For instructions on creating a RESTful API, see Call an Amazon SageMaker model endpoint using Amazon API Gateway and AWS Lambda.

Bringing design principles into sports analytics

To create the first real-time prediction model for the tournament with a millisecond latency requirement, the ML Solutions Lab team worked backwards to identify areas in which design thinking could save time and resources. The team worked on an end-to-end notebook within an Amazon SageMaker environment, which enabled data access, raw data parsing, data preprocessing and visualization, feature engineering, model training and evaluation, and model deployment in one place. This helped in automating the modeling process.

Moreover, the ML Solutions Lab team implemented a model update iteration for when the model was updated with newly generated data, in which the model parses and processes only the additional data. This brings computational and time efficiencies to the modeling.

In terms of next steps, the Stats Perform AI team has been looking at the next stage of rugby analysis by breaking down the other strategic facets as line-outs, scrums and teams, and continuous phases of play using the fine-grain spatio-temporal data captured. The state-of-the-art feature representations and latent factor modelling (which have been utilized so effectively in Stats Perform’s “Edge” match-analysis and recruitment products in soccer) means that there is plenty of fertile space for innovation that can be explored in rugby.


Six Nations Rugby, Stats Perform, and AWS came together to bring the first real-time prediction model to the 2020 Guinness Six Nations Rugby Championship. The model determined a penalty or conversion kick success probability from anywhere in the field. They used Amazon SageMaker to build, train, and deploy the ML model with variables grouped into three main categories: location-based features, player performance features, and in-game situational features. The Amazon SageMaker endpoint provided prediction results with subsecond latency. The model was used by broadcasters during the live games in the Six Nations 2020 Championship, bringing a new metric to millions of rugby fans.

You can find full, end-to-end examples of creating custom training jobs, training state-of-the-art object detection models, and model deployment on Amazon SageMaker on the AWS Labs GitHub repo. To learn more about the ML Solutions Lab, see Amazon Machine Learning Solutions Lab.

About the Authors

Mehdi Noori is a Data Scientist at the Amazon ML Solutions Lab, where he works with customers across various verticals, and helps them to accelerate their cloud migration journey, and to solve their ML problems using state-of-the-art solutions and technologies.

Tesfagabir Meharizghi is a Data Scientist at the Amazon ML Solutions Lab where he works with customers across different verticals accelerate their use of artificial intelligence and AWS cloud services to solve their business challenges. Outside of work, he enjoys spending time with his family and reading books.

Patrick Lucey is the Chief Scientist at Stats Perform. Patrick started the Artificial Intelligence group at Stats Perform in 2015, with thegroup focusing on both computer vision and predictive modelling capabilities in sport. Previously, he was at Disney Research for 5 years, where he conducted research into automatic sports broadcasting using large amounts of spatiotemporal tracking data. He received his BEng(EE) from USQ and PhD from QUT, Australia in 2003 and 2008 respectively. He was also co-author of the best paper at the 2016 MIT Sloan Sports Analytics Conference and in 2017 & 2018 was co-author of best-paper runner-up at the same conference.

Xavier Ragot is Data Scientist with the Amazon ML Solution Lab team where he helps design creative ML solution to address customers’ business problems in various industries.


Continue Reading


Building an NLU-powered search application with Amazon SageMaker and the Amazon ES KNN feature

The rise of semantic search engines has made ecommerce and retail businesses search easier for its consumers. Search engines powered by natural language understanding (NLU) allow you to speak or type into a device using your preferred conversational language rather than finding the right keywords for fetching the best results. You can query using words […]



The rise of semantic search engines has made ecommerce and retail businesses search easier for its consumers. Search engines powered by natural language understanding (NLU) allow you to speak or type into a device using your preferred conversational language rather than finding the right keywords for fetching the best results. You can query using words or sentences in your native language, leaving it to the search engine to deliver the best results.

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. Amazon Elasticsearch Service (Amazon ES) is a fully managed service that makes it easy for you to deploy, secure, and run Elasticsearch cost-effectively at scale. Amazon ES offers KNN search, which can enhance search in use cases such as product recommendations, fraud detection, and image, video, and some specific semantic scenarios like document and query similarity. Alternatively, you can also choose Amazon Kendra, a highly accurate and easy to use enterprise search service that’s powered by machine learning, with no machine learning experience required. In this post, we explain how you can implement an NLU-based product search for certain types of applications using Amazon SageMaker and the Amazon ES k-nearest neighbor (KNN) feature.

In the post Building a visual search application with Amazon SageMaker and Amazon ES, we shared how to build a visual search application using Amazon SageMaker and the Amazon ES KNN’s Euclidean distance metric. Amazon ES now supports open-source Elasticsearch version 7.7 and includes the cosine similarity metric for KNN indexes. Cosine similarity measures the cosine of the angle between two vectors in the same direction, where a smaller cosine angle denotes higher similarity between the vectors. With cosine similarity, you can measure the orientation between two vectors, which makes it the ideal choice for some specific semantic search applications. The highly distributed architecture of Amazon ES enables you to implement an enterprise-grade search engine with enhanced KNN ranking, with high recall and performance.

In this post, you build a very simple search application that demonstrates the potential of using KNN with Amazon ES compared to the traditional Amazon ES ranking method, including a web application for testing the KNN-based search queries in your browser. The application also compares the search results with Elasticsearch match queries to demonstrate the difference between KNN search and full-text search.

Overview of solution

Regular Elasticsearch text-matching search is useful when you want to do text-based search, but KNN-based search is a more natural way to search for something. For example, when you search for a wedding dress using KNN-based search application, it gives you similar results if you type “wedding dress” or “marriage dress.” Implementing this KNN-based search application consists of two phases:

  • KNN reference index – In this phase, you pass a set of corpus documents through a deep learning model to extract their features, or embeddings. Text embeddings are a numerical representation of the corpus. You save those features into a KNN index on Amazon ES. The concept underpinning KNN is that similar data points exist in close proximity in the vector space. As an example, “summer dress” and “summer flowery dress” are both similar, so these text embeddings are collocated, as opposed to “summer dress” vs. “wedding dress.”
  • KNN index query – This is the inference phase of the application. In this phase, you submit a text search query through the deep learning model to extract the features. Then, you use those embeddings to query the reference KNN index. The KNN index returns similar text embeddings from the KNN vector space. For example, if you pass a feature vector of “marriage dress” text, it returns “wedding dress” embeddings as a similar item.

Next, let’s take a closer look at each phase in detail, with the associated AWS architecture.

KNN reference index creation

For this use case, you use dress images and their visual descriptions from the Feidegger dataset. This dataset is a multi-modal corpus that focuses specifically on the domain of fashion items and their visual descriptions in German. The dataset was created as part of ongoing research at Zalando into text-image multi-modality in the area of fashion.

In this step, you translate each dress description from German to English using Amazon Translate. From each English description, you extract the feature vector, which is an n-dimensional vector of numerical features that represent the dress. You use a pre-trained BERT model hosted in Amazon SageMaker to extract 768 feature vectors of each visual description of the dress, and store them as a KNN index in an Amazon ES domain.

The following screenshot illustrates the workflow for creating the KNN index.

The process includes the following steps:

  1. Users interact with a Jupyter notebook on an Amazon SageMaker notebook instance. An Amazon SageMaker notebook instance is an ML compute instance running the Jupyter Notebook app. Amazon SageMaker manages creating the instance and related resources.
  2. Each item description, originally open-sourced in German, is translated to English using Amazon Translate.
  3. A pre-trained BERT model is downloaded, and the model artifact is serialized and stored in Amazon Simple Storage Service (Amazon S3). The model is used to serve from a PyTorch model server on an Amazon SageMaker real-time endpoint.
  4. Translated descriptions are pushed through the SageMaker endpoint to extract fixed-length features (embeddings).
  5. The notebook code writes the text embeddings to the KNN index along with product Amazon S3 URI in an Amazon ES domain.

KNN search from a query text

In this step, you present a search query text string from the application, which passes through the Amazon SageMaker hosted model to extract 768 features. You use these features to query the KNN index in Amazon ES. KNN for Amazon ES lets you search for points in a vector space and find the nearest neighbors for those points by cosine similarity (the default is Euclidean distance). When it finds the nearest neighbors vectors (for example, k = 3 nearest neighbors) for a given query text, it returns the associated Amazon S3 images to the application. The following diagram illustrates the KNN search full-stack application architecture.

The process includes the following steps:

  1. The end-user accesses the web application from their browser or mobile device.
  2. A user-provided search query string is sent to Amazon API Gateway and AWS Lambda.
  3. The Lambda function invokes the Amazon SageMaker real-time endpoint, and the model returns a vector of the search query embeddings. Amazon SageMaker hosting provides a managed HTTPS endpoint for predictions and automatically scales to the performance needed for your application using Application Auto Scaling.
  4. The function passes the search query embedding vector as the search value for a KNN search in the index in the Amazon ES domain. A list of k similar items and their respective Amazon S3 URIs are returned.
  5. The function generates pre-signed Amazon S3 URLs to return back to the client web application, used to display similar items in the browser.


For this walkthrough, you should have an AWS account with appropriate AWS Identity and Access Management (IAM) permissions to launch the AWS CloudFormation template.

Deploying your solution

You use a CloudFormation stack to deploy the solution. The stack creates all the necessary resources, including the following:

  • An Amazon SageMaker notebook instance to run Python code in a Jupyter notebook
  • An IAM role associated with the notebook instance
  • An Amazon ES domain to store and retrieve sentence embedding vectors into a KNN index
  • Two S3 buckets: one for storing the source fashion images and another for hosting a static website

From the Jupyter notebook, you also deploy the following:

  • An Amazon SageMaker endpoint for getting fixed-length sentence embedding vectors in real time.
  • An AWS Serverless Application Model (AWS SAM) template for a serverless backend using API Gateway and Lambda.
  • A static front-end website hosted on an S3 bucket to demonstrate a real-world, end-to-end ML application. The front-end code uses ReactJS and the AWS Amplify JavaScript library.

To get started, complete the following steps:

  1. Sign in to the AWS Management Console with your IAM user name and password.
  2. Choose Launch Stack and open it in a new tab:

  1. On the Quick create stack page, select the check-box to acknowledge the creation of IAM resources.
  2. Choose Create stack.

  1. Wait for the stack to complete.

You can examine various events from the stack creation process on the Events tab. When the stack creation is complete, you see the status CREATE_COMPLETE.

You can look on the Resources tab to see all the resources the CloudFormation template created.

  1. On the Outputs tab, choose the SageMakerNotebookURL

This hyperlink opens the Jupyter notebook on your Amazon SageMaker notebook instance that you use to complete the rest of the lab.

You should be on the Jupyter notebook landing page.

  1. Choose nlu-based-item-search.ipynb.

Building a KNN index on Amazon ES

For this step, you should be at the beginning of the notebook with the title NLU based Item Search. Follow the steps in the notebook and run each cell in order.

You use a pre-trained BERT model (distilbert-base-nli-stsb-mean-tokens) from sentence-transformers and host it on an Amazon SageMaker PyTorch model server endpoint to generate fixed-length sentence embeddings. The embeddings are saved to the Amazon ES domain created in the CloudFormation stack. For more information, see the markdown cells in the notebook.

Continue when you reach the cell Deploying a full-stack NLU search application in your notebook.

The notebook contains several important cells; we walk you through a few of them.

Download the multi-modal corpus dataset from Feidegger, which contains fashion images and descriptions in German. See the following code:

## Data Preparation import os import shutil
import json
import tqdm
import urllib.request
from tqdm import notebook
from multiprocessing import cpu_count
from tqdm.contrib.concurrent import process_map images_path = 'data/feidegger/fashion'
filename = 'metadata.json' my_bucket = s3_resource.Bucket(bucket) if not os.path.isdir(images_path): os.makedirs(images_path) def download_metadata(url): if not os.path.exists(filename): urllib.request.urlretrieve(url, filename) #download metadata.json to local notebook
download_metadata('') def generate_image_list(filename): metadata = open(filename,'r') data = json.load(metadata) url_lst = [] for i in range(len(data)): url_lst.append(data[i]['url']) return url_lst def download_image(url): urllib.request.urlretrieve(url, images_path + '/' + url.split("/")[-1]) #generate image list url_lst = generate_image_list(filename) workers = 2 * cpu_count() #downloading images to local disk
process_map(download_image, url_lst, max_workers=workers)

Upload the dataset to Amazon S3:

# Uploading dataset to S3 files_to_upload = []
dirName = 'data'
for path, subdirs, files in os.walk('./' + dirName): path = path.replace("\","/") directory_name = path.replace('./',"") for file in files: files_to_upload.append({ "filename": os.path.join(path, file), "key": directory_name+'/'+file }) def upload_to_s3(file): my_bucket.upload_file(file['filename'], file['key']) #uploading images to s3
process_map(upload_to_s3, files_to_upload, max_workers=workers)

This dataset has product descriptions in German, so you use Amazon Translate for the English translation for each German sentence:

with open(filename) as json_file: data = json.load(json_file) #Define translator function
def translate_txt(data): results = {} results['filename'] = f's3://{bucket}/data/feidegger/fashion/' + data['url'].split("/")[-1] results['descriptions'] = [] translate = boto3.client(service_name='translate', use_ssl=True) for i in data['descriptions']: result = translate.translate_text(Text=str(i), SourceLanguageCode="de", TargetLanguageCode="en") results['descriptions'].append(result['TranslatedText']) return results

Save the sentence transformers model to notebook instance:

!pip install sentence-transformers #Save the model to disk which we will host at sagemaker
from sentence_transformers import models, SentenceTransformer
saved_model_dir = 'transformer'
if not os.path.isdir(saved_model_dir): os.makedirs(saved_model_dir) model = SentenceTransformer('distilbert-base-nli-stsb-mean-tokens')

Upload the model artifact (model.tar.gz) to Amazon S3 with the following code:

#zip the model .gz format
import tarfile
export_dir = 'transformer'
with'model.tar.gz', mode='w:gz') as archive: archive.add(export_dir, recursive=True) #Upload the model to S3 inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')

Deploy the model into an Amazon SageMaker PyTorch model server using the Amazon SageMaker Python SDK. See the following code:

from sagemaker.pytorch import PyTorch, PyTorchModel
from sagemaker.predictor import RealTimePredictor
from sagemaker import get_execution_role class StringPredictor(RealTimePredictor): def __init__(self, endpoint_name, sagemaker_session): super(StringPredictor, self).__init__(endpoint_name, sagemaker_session, content_type='text/plain') pytorch_model = PyTorchModel(model_data = inputs, role=role, entry_point ='', source_dir = './code', framework_version = '1.3.1', predictor_cls=StringPredictor) predictor = pytorch_model.deploy(instance_type='ml.m5.large', initial_instance_count=3)

Define a cosine similarity Amazon ES KNN index mapping with the following code (to define cosine similarity KNN index mapping, you need Amazon ES 7.7 and above):

#KNN index maping
knn_index = { "settings": { "index.knn": True, "index.knn.space_type": "cosinesimil", "analysis": { "analyzer": { "default": { "type": "standard", "stopwords": "_english_" } } } }, "mappings": { "properties": { "zalando_nlu_vector": { "type": "knn_vector", "dimension": 768 } } }

Each product has five visual descriptions, so you combine all five descriptions and get one fixed-length sentence embedding. See the following code:

# For each product, we are concatenating all the # product descriptions into a single sentence,
# so that we will have one embedding for each product def concat_desc(results): obj = { 'filename': results['filename'], } obj['descriptions'] = ' '.join(results['descriptions']) return obj concat_results = map(concat_desc, results)
concat_results = list(concat_results)

Import the sentence embeddings and associated Amazon S3 image URI into the Amazon ES KNN index with the following code. You also load the translated descriptions in full text, so that later you can compare the difference between KNN search and standard match text queries in Elasticsearch.

# defining a function to import the feature vectors corresponds to each S3 URI into Elasticsearch KNN index
# This process will take around ~10 min. def es_import(concat_result): vector = json.loads(predictor.predict(concat_result['descriptions'])) es.index(index='idx_zalando', body={"zalando_nlu_vector": vector, "image": concat_result['filename'], "description": concat_result['descriptions']} ) workers = 8 * cpu_count() process_map(es_import, concat_results, max_workers=workers)

Building a full-stack KNN search application

Now that you have a working Amazon SageMaker endpoint for extracting text features and a KNN index on Amazon ES, you’re ready to build a real-world, full-stack ML-powered web app. You use an AWS SAM template to deploy a serverless REST API with API Gateway and Lambda. The REST API accepts new search strings, generates the embeddings, and returns similar relevant items to the client. Then you upload a front-end website that interacts with your new REST API to Amazon S3. The front-end code uses Amplify to integrate with your REST API.

  1. In the following cell, prepopulate a CloudFormation template that creates necessary resources such as Lambda and API Gateway for full-stack application:
s3_resource.Object(bucket, 'backend/template.yaml').upload_file('./backend/template.yaml', ExtraArgs={'ACL':'public-read'}) sam_template_url = f'https://{bucket}' # Generate the CloudFormation Quick Create Link print("Click the URL below to create the backend API for NLU search:n")
print(( '' f'?templateURL={sam_template_url}' '&stackName=nlu-search-api' f'&param_BucketName={outputs["s3BucketTraining"]}' f'&param_DomainName={outputs["esDomainName"]}' f'&param_ElasticSearchURL={outputs["esHostName"]}' f'&param_SagemakerEndpoint={predictor.endpoint}'

The following screenshot shows the output: a pre-generated CloudFormation template link.

  1. Choose the link.

You are sent to the Quick create stack page.

  1. Select the check-boxes to acknowledge the creation of IAM resources, IAM resources with custom names, and CAPABILITY_AUTO_EXPAND.
  2. Choose Create stack.

When the stack creation is complete, you see the status CREATE_COMPLETE. You can look on the Resources tab to see all the resources the CloudFormation template created.

  1. After the stack is created, proceed through the cells.

The following cell indicates that your full-stack application, including front-end and backend code, are successfully deployed:

print('Click the URL below:n')
print(outputs['S3BucketSecureURL'] + '/index.html')

The following screenshot shows the URL output.

  1. Choose the link.

You are sent to the application page, where you can provide your own search text to find products using both the KNN approach and regular full-text search approaches.

  1. When you’re done testing and experimenting with your KNN search application, run the last two cells at the bottom of the notebook:
# Delete the endpoint
predictor.delete_endpoint() # Empty S3 Contents
training_bucket_resource = s3_resource.Bucket(bucket)
training_bucket_resource.objects.all().delete() hosting_bucket_resource = s3_resource.Bucket(outputs['s3BucketHostingBucketName'])

These cells end your Amazon SageMaker endpoint and empty your S3 buckets to prepare you for cleaning up your resources.

Cleaning up

To delete the rest of your AWS resources, go to the AWS CloudFormation console and delete the nlu-search-api and nlu-search stacks.


In this post, we showed you how to create a KNN-based search application using Amazon SageMaker and Amazon ES KNN index features. You used a pre-trained BERT model from the sentence-transformers Python library. You can also fine-tune your BERT model using your own dataset. For more information, see Fine-tuning a PyTorch BERT model and deploying it with Amazon Elastic Inference on Amazon SageMaker.

A GPU instance is recommended for most deep learning purposes. In many cases, training new models is faster on GPU instances than CPU instances. You can scale sub-linearly when you have multi-GPU instances or if you use distributed training across many instances with GPUs. However, we used CPU instances for this use case so you can complete the walkthrough under the AWS Free Tier.

For more information about the code sample in the post, see the GitHub repo.

About the Authors

Amit Mukherjee is a Sr. Partner Solutions Architect with a focus on data analytics and AI/ML. He works with AWS partners and customers to provide them with architectural guidance for building highly secure and scalable data analytics platforms and adopting machine learning at a large scale.

Laith Al-Saadoon is a Principal Solutions Architect with a focus on data analytics at AWS. He spends his days obsessing over designing customer architectures to process enormous amounts of data at scale. In his free time, he follows the latest in machine learning and artificial intelligence.


Continue Reading


A Quick Guide to Conversational AI And It’s Working Process

Customer support is an integral part of every business; without offering support services, it is difficult to achieve maximum customer satisfaction. To ensure the same, […]

The post A Quick Guide to Conversational AI And It’s Working Process appeared first on Quytech Blog.



Customer support is an integral part of every business; without offering support services, it is difficult to achieve maximum customer satisfaction. To ensure the same, businesses hire professionals who work round the clock to deliver support services. No matter how efficiently a business handles this segment, they might have to face problems such as “delay in responding customers’ queries” or “making a customer wait to connect with the support professionals”, and more.

A Conversational AI is a perfect solution to this most common challenge that manufacturing, FMCG, retail, e-commerce, and other industries are facing. Never heard of this term?

Well, this is the latest trend in almost all the industries that are already using artificial intelligence technology or wanting to adopt the same in their business operations. Let’s read about the same in detail.

What is Conversational AI?

Conversational AI is a specific kind of artificial intelligence that makes software interact and converse with users in a highly intuitive way. It uses natural language processing (NLP) to understand and interact in human language.

The conversational AI, when integrated into chatbots, messengers, and voice assistants, enables businesses to deliver personalized customer experience and achieve 100% customer satisfaction. Google Home and Amazon Echo are the two popular examples of it.

Applications of Conversational AI

Conversational AI a new and automated way of offering customer support services. Healthcare, Adtech, logistics, insurance, travel, hospitality, finances, and other industries are using technology in the following:

Messaging applications

Conversational AI can be used in a messaging application to offer personalized support services through chat. Your customers can choose the “chat support” option and talk to the chatbot to get the support.

Speech-based virtual assistants

A conversational AI can use speech recognition technology so that you can offer a speech-based virtual assistant for your customers. These are the type of chatbots where users can get any information through voice commands.

Virtual customer assistants

This type of conversational AI helps in offering online support to customers; you can develop the same to offer support through Web, SMS, messaging applications, and more.

Virtual personal assistants

A virtual personal assistant, powered by conversational AI, minimizes the need of hiring a huge team to offer dedicated support services to each of your customers.

Virtual employee assistants

Employees working in big organizations might need various types of assistants. A conversational AI used to build a virtual employee assistant can be the point of contact for all such employees. They can find the required information just by interacting with that assistant.

Robotic process automation

Robotic process automation using the potential of conversational AI helps a machine to understand human conversations and their intent to perform automated tasks.

Working of conversational AI

Machine learning, deep learning, and natural language processing are the three main technologies behind conversational AI. Here is how it works:

  1. Collection of unstructured data from various sources
  2. Data preprocessing and feature engineering
  3. Creating an AI model
  4. Training the model to automatically improve from experiences
  5. Testing the model
  6. Detecting patterns and making decisions
  7. AI deployment

What benefits businesses can get by using conversational AI?

Apart from helping businesses to deliver an unmatched customer experience, conversational AI can offer a plethora of other benefits that include:

Saves Time

Having an automated chatbot would help you save a considerable amount of time. The saved time can be used to perform other tasks or focus on your business’ marketing and promotion.

Helps in providing Real-Time Support Services

A conversational AI can handle multiple queries at one time, without even letting other customers wait. In short, every customer will feel like they are getting dedicated support services in real-time.

Improves business efficiency

With a conversational AI, you can have the assurance that your customers are being taken care of properly. In short, their queries are being handled and resolved immediately. You can focus on other segments of your business and improve efficiency.

Helps in lowering down Customers’ Complaints

Since a conversational AI can immediately respond to queries of the customers, it can help to reduce the number of complaints. Resolving customers’ complaints without making them call a support professional would increase customer loyalty and increase your brand reputation.

Increases chances of sales

By providing a persistent communication channel that precedes the context further, you can make your customers explore more and shop more. Moreover, a conversational AI can also help in reducing the cart abandonment rate as it can provide immediate assistance regarding the issues a user is facing while making the payment, applying a discount code, or at any other time.

The user would not even have to contact the support center. To understand this better, let’s take an example- a user has added one or more products to the cart, but he/she is unable to find his/her preferred mode of payment. After a few minutes, what would the customer do, either visit the contact us section to get the customer support number and connect to a professional (which is a time taking process) or simply get the support through a conversational AI. The latter would automatically send a message to the customer to ask the query and provide a resolution.

Now when you know everything about conversational AI and want to build one for you, then contact Quytech, a trusted AI app development company. Quytech has more than a decade of experience in working on artificial intelligence, machine learning, and other latest technologies.

Final Words

Are you curious to know about conversational artificial intelligence? Give this article a read to know the definition and working of conversational AI in detail. We have also mentioned the reasons why this technology is becoming the talk of the town among businesses of all sizes and types. After reading the article, if you want to develop a tailor-made conversational AI for your business, then reach out to a reliable and experienced AI development company or hire AI developers with considerable experience.


Continue Reading
AI32 mins ago

Bringing real-time machine learning-powered insights to rugby using Amazon SageMaker

AI38 mins ago

Building an NLU-powered search application with Amazon SageMaker and the Amazon ES KNN feature

AI4 hours ago

A Quick Guide to Conversational AI And It’s Working Process

AI5 hours ago

Are Legal chatbots worth the time and effort?

AI8 hours ago

Inbenta Announces Partnership with IntelePeer to Deliver Smarter Workflows to Customers

AI2 days ago

Things to Know about Free Form Templates

AI2 days ago

Are Chatbots Vulnerable? Best Practices to Ensure Chatbots Security

AI2 days ago

Best Technology Stacks For Mobile App Development

AI3 days ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI3 days ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI3 days ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI3 days ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI3 days ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI3 days ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI3 days ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI3 days ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI3 days ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI3 days ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI3 days ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI3 days ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition