Connect with us

AI

DeepMind papers at ICML 2018

The 2018 International Conference on Machine Learning will take place in Stockholm, Sweden from 10-15 July.For those attending and planning the week ahead, we are sharing a schedule of DeepMind presentations at ICML (you can download a pdf version here). We look forward to the many engaging discussions, ideas, and collaborations that are sure to arise from the conference!Efficient Neural Audio SynthesisAuthors: Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Nouri, Norman Casagrande, Edward Lockhart, Sander Dieleman, Aaron van den Oord, Koray KavukcuogluSequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating desired samples. Efficient sampling for this class of models at the cost of little to no loss in quality has however remained an elusive task. With a focus on text-to-speech synthesis, we show that compact recurrent architectures, a remarkably high degree of weight sparsification and a novel reordering of the variables greatly reduce sampling latency while maintaining high audio fidelity. We first describe a compact single-layer recurrent neural network, the WaveRNN, with a novel dual softmax layer that matches the quality of the state-of-the-art WaveNet model.

Published

on

Efficient Neural Audio Synthesis

Authors: Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Nouri, Norman Casagrande, Edward Lockhart, Sander Dieleman, Aaron van den Oord, Koray Kavukcuoglu

Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating desired samples. Efficient sampling for this class of models at the cost of little to no loss in quality has however remained an elusive task. With a focus on text-to-speech synthesis, we show that compact recurrent architectures, a remarkably high degree of weight sparsification and a novel reordering of the variables greatly reduce sampling latency while maintaining high audio fidelity. We first describe a compact single-layer recurrent neural network, the WaveRNN, with a novel dual softmax layer that matches the quality of the state-of-the-art WaveNet model. Persistent GPU kernels for the WaveRNN are able to synthesize 24kHz 16-bit audio 4 times faster than real time. We then apply a weight sparsification technique to the model. We show that, given a constant number of weights, large sparse networks perform better than small dense networks. Using a large Sparse WaveRNN, we demonstrate the feasibility of real-time synthesis of high-fidelity audio on a low-power mobile phone CPU. We use a large Sparse WaveRNN to demonstrate the first instance of real-time synthesis of high-fidelity audio on low-resource mobile phone CPU. Finally, we introduce a novel reordering of the variables in the factorization of the joint distribution. The reordering makes it possible to trade vacuous dependencies on samples from the distant future for the ability to generate in batches. The Batch WaveRNN produces up to 16 samples per step maintaining high quality and enables audio synthesis that is up to 40 times faster than real time.

Presentations:

  • 11:00 – 11:20 AM @ Victoria (Oral)
  • 06:15 – 09:00 PM @ Hall B #105 (Poster)
Learning to Search with MCTSnets

Authors: Arthur Guez*, Theophane Weber*, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Remi Munos, David Silver

Planning problems are among the most important and well-studied problems in artificial intelligence. They are most typically solved by tree search algorithms that simulate ahead into the future, evaluate future states, and back-up those evaluations to the root of a search tree. Among these algorithms, Monte-Carlo tree search (MCTS) is one of the most general, powerful and widely used. A typical implementation of MCTS uses cleverly designed rules, optimised to the particular characteristics of the domain. These rules control where the simulation traverses, what to evaluate in the states that are reached, and how to back-up those evaluations. In this paper we instead learn where, what and how to search. Our architecture, which we call an MCTSnet, incorporates simulation-based search inside a neural network, by expanding, evaluating and backing-up a vector embedding. The parameters of the network are trained end-to-end using gradient-based optimisation. When applied to small searches in the well-known planning problem Sokoban, the learned search algorithm significantly outperformed MCTS baselines.

Presentations:

  • 11:20 – 11:30 AM @ Victoria (Oral)
  • 06:15 – 09:00 PM @ Hall B #92 (Poster)
LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration

Authors: Gellert Weisz, Andras Gyorgy, and Csaba Szepesvari

We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution. The goal of the configurator is to find a configuration that runs fast on average on most instances, and do so with the least amount of total work. It can run a chosen solver on a random instance until the solver finishes or a timeout is reached. We propose LEAPSANDBOUNDS, an algorithm that tests configurations on randomly selected problem instances for longer and longer time. We prove that the capped expected runtime of the configuration returned by LEAPSANDBOUNDS is close to the optimal expected runtime, while our algorithm’s running time is near-optimal. Our results show that LEAPSANDBOUNDS is more efficient than the recent algorithm of Kleinberg et al. (2017), which, to our knowledge, is the only other algorithm configuration method that claims to have non-trivial theoretical guarantees. Experimental results on configuring a public SAT solver on a public benchmark also stand witness to the superiority of our method.

Presentations:

  • 11:30 – 11:40 AM @ A6 (Oral)
  • 06:15 – 09:00 PM @ Hall B #165 (Poster)
Implicit Quantile Networks for Distributional Reinforcement Learning

Authors: Will Dabney*, Georg Ostrovski*, David Silver, Remi Munos

In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithms implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.

Presentations:

  • 11:40 – 11:50 AM @ A1 (Oral)
  • 06:15 – 09:00 PM @ Hall B #3 (Poster)
Graph Networks as Learnable Physics Engines for Inference and Control

Authors: Alvaro Sanchez, Nicolas Heess, Jost Tobias Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, Peter Battaglia

Understanding and interacting with everyday physical scenes requires rich knowledge about the structure of the world, represented either implicitly in a value or policy function, or explicitly in a transition model. Here we introduce a new class of learnable models—based on graph networks—which implement an inductive bias for object- and relation-centric representations of complex, dynamical systems. Our results show that as a forward model, our approach supports accurate predictions, and surprisingly strong and efficient generalization, across eight distinct physical systems which we varied parametrically and structurally. We also found that our inference model can perform system identification from real and simulated data. Our models are also differentiable, and support online planning via gradient based trajectory optimization, as well as offline policy optimization. Our framework offers new opportunities for harnessing and exploiting rich knowledge about the world, and takes a key step toward building machines with more human-like representations of the world.

Presentations:

  • 11:50 AM – 12:00 PM @ Victoria (Oral)
  • 06:15 – 09:00 PM @ Hall B #84 (Poster)
More Robust Doubly Robust Off-policy Evaluation

Authors: Mehrdad Farajtabar, Yinlam Chow, and Mohammad Ghavamzadeh

We study the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of a policy from the data generated by another policy(ies). In particular, we focus on the doubly robust (DR) estimators that consist of an importance sampling (IS) component and a performance model, and utilize the low (or zero) bias of IS and low variance of the model at the same time. Although the accuracy of the model has a huge impact on the overall performance of DR, most of the work on using the DR estimators in OPE has been focused on improving the IS part, and not much on how to learn the model. In this paper, we propose alternative DR estimators, called more robust doubly robust (MRDR), that learn the model parameter by minimizing the variance of the DR estimator. We first present a formulation for learning the DR model in RL. We then derive formulas for the variance of the DR estimator in both contextual bandits and RL, such that their gradients w.r.t. the model parameters can be estimated from the samples, and propose methods to efficiently minimize the variance. We prove that the MRDR estimators are strongly consistent and asymptotically optimal. Finally, we evaluate MRDR in bandits and RL benchmark problems, and compare its performance with the existing methods.

Presentations:

  • 11:50 AM – 12:00 PM @ A1 (Oral)
  • 06:15 – 09:00 PM @ Hall B #62 (Poster)
Conditional Neural Processes

Authors:  Marta Garnelo, Dan Rosenbaum, Christopher Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo Rezende, S. M. Ali Eslami

Deep neural networks excel at function approximation, yet they are typically trained from scratch for each new function. On the other hand, Bayesian methods, such as Gaussian Processes (GPs), exploit prior knowledge to quickly infer the shape of a new function at test time. Yet GPs are computationally expensive, and it can be hard to design appropriate priors. In this paper we propose a family of neural models, Conditional Neural Processes (CNPs), that combine the benefits of both. CNPs are inspired by the flexibility of stochastic processes such as GPs, but are structured as neural networks and trained via gradient descent. CNPs make accurate predictions after observing only a handful of training data points, yet scale to complex functions and large datasets. We demonstrate the performance and versatility of the approach on a range of canonical machine learning tasks, including regression, classification

and image completion.

Presentations:

  • 02:10 – 02:20 PM @ Victoria (Oral)
  • 06:15 – 09:00 PM @ Hall B #130 (Poster)
Generative Temporal Models with Spatial Memory for Partially Observed Environments

Authors: Marco  Fraccaro, Danilo  Jimenez Rezende, Yori  Zwols, Alexander Pritzel, S. M. Ali Eslami, Fabio Viola

In model-based reinforcement learning, generative and temporal models of environments can be leveraged to boost agent performance, either by tuning the agent’s representations during training or via use as part of an explicit planning mechanism. However, their application in practice has been limited to simplistic environments, due to the difficulty of training such models in larger, potentially partially-observed and 3D environments. In this work we introduce a novel action-conditioned generative model of such challenging environments. The model features a non-parametric spatial memory system in which we store learned, disentangled representations of the environment. Low-dimensional spatial updates are computed using a state-space model that makes use of knowledge on the prior dynamics of the moving agent, and high-dimensional visual observations are modelled with a Variational Auto-Encoder. The result is a scalable architecture capable of performing coherent predictions over hundreds of time steps across a range of partially observed 2D and 3D environments.

Presentations:

  • 02:30 – 02:50 PM @ A7 (Oral)
  • 06:15 – 09:00 PM @ Hall B #101 (Poster)
Disentangling by Factorising

Authors: Hyunjik Kim, Andriy Mnih

We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of rep-resentations to be factorial and hence independent across the dimensions. We show that it improves upon β-VAE by providing a better trade-off between disentanglement and reconstruction quality and being more robust to the number of training iterations. Moreover, we highlight the problems of a commonly used disentanglement metric and introduce a new metric that does not suffer from them.

Presentations:

  • 02:50 – 03:00 PM @ A7 (Oral)
  • 06:15 – 09:00 PM @ Hall B #90 (Poster)
Learning by Playing – Solving Sparse Reward Tasks from Scratch

Authors: Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tobias Springenberg

We propose Scheduled Auxiliary Control (SAC), a new learning paradigm in the context of Reinforcement Learning (RL) . SAC enables learning of complex behaviors – from scratch – in the presence of multiple sparse reward signals. To achieve this the agent is equipped with a set of general auxiliary tasks, that it attempts to learn simultaneously via off-policy RL. The key idea behind our method is that active (learned) scheduling and execution of auxiliary policies allows the agent to efficiently explore its environment – enabling it
to excel at sparse reward RL. Our experiments in several challenging robotic manipulation settings demonstrate the power of our approach.

Read more on the DeepMind blog.

Presentations:

  • 04:20 – 04:40 PM @ A1 (Oral)
  • 06:15 – 09:00 PM @ Hall B #41 (Poster)
Fast Parametric Learning with Activation Memorization

Authors: Jack W Rae, Chris Dyer, Peter Dayan, Timothy P Lillicrap

Neural networks trained with backpropagation often struggle to identify classes that have been observed a small number of times. In applications where most class labels are rare, such as language modelling, this can become a performance bottleneck. One potential remedy is to augment the network with a fast-learning non-parametric model which attends over recent activations. We explore a simplified architecture where we treat a subset of the model parameters as fast memory stores. This can help retain information over longer time intervals than a traditional memory, and does not require additional space or compute. In the case of image classification, we display faster binding of novel classes on an Omniglot image curriculum task. We also show improved performance for word-based language models on news reports (GigaWord), books (Project Gutenberg) and Wikipedia articles (WikiText-103) — the latter achieving state-of-the-art perplexity.

Presentations:

  • 04:40 – 04:50 PM @ Victoria (Oral)
  • 06:15 – 09:00 PM @ Hall B #121 (Poster)
Learning Implicit Generative Models with the Method of Learned Moments

Authors: Suman Ravuri, Shakir Mohamed, Mihaela Rosca, and Oriol Vinyals

We propose a method of moments (MoM) algorithm for training large-scale implicit generative models. Moment estimation in this setting encounters two problem: it is often difficult to define the millions of moments needed to learn the model parameters, and it is hard to determine which properties are useful when specifying moments. To address the first issue, we introduce a moment network, and define the moments as the gradient of the network’s output with respect to its parameters and the network’s hidden units. To tackle the second problem, we use asymptotic theory to highlight desiderata for moments – namely they should minimize the asymptotic variance of estimated model parameters – and introduce an objective to learn better moments. The sequence of objectives created by this Method of Learned Moments (MoLM) can train high-quality neural image samplers. On CIFAR-10, we demonstrate that MoLM-trained generators achieve significantly higher Inception Scores and lower Frechet Inception Distances than those trained with gradient penalty regularized adversarial objectives. These generators also achieve nearly perfect Multi-Scale Structural Similarity Scores on CelebA, and can create high-quality samples of resolutions up to 128×128.

Presentations:

  • 04:40 – 04:50 PM @ A7 (Oral)
  • 06:15 – 09:00 PM @ Hall B #112 (Poster)
Automatic Goal Generation for Reinforcement Learning Agents

Authors: David Held, Xinyang Geng, Carlos Florensa, Pieter Abbeel

Reinforcement learning is a powerful technique to train an agent to perform a task. However, an agent that is trained using reinforcement learning is only capable of achieving the single task that is specified via its reward function. Such an approach does not scale well to settings in which an agent needs to perform a diverse set of tasks, such as navigating to varying positions in a room or moving objects to varying locations. Instead, we propose a method that allows an agent to automatically discover the range of tasks that it is capable of performing. We use a generator network to propose tasks for the agent to try to achieve, specified as goal states. The generator network is optimized using adversarial training to produce tasks that are always at the appropriate level of difficulty for the agent. Our method thus automatically produces a curriculum of tasks for the agent to learn. We show that, by using this framework, an agent can efficiently and automatically learn to perform a wide set of tasks without requiring any prior knowledge of its environment. Our method can also learn to achieve tasks with sparse rewards, which traditionally pose significant challenges.

Presentations:

  • 04:40 – 04:50 PM @ A1 (Oral)
  • 06:15 – 09:00 PM @ Hall B #135 (Poster)
Machine Theory of Mind

Authors:  Neil C. Rabinowitz, Frank Perbet, H. Francis Song, Chiyuan Zhang, S. M. Ali Eslami, Matthew Botvinick

Theory of mind (ToM) broadly refers to humans’ ability to represent the mental states of others, including their desires, beliefs, and intentions. We propose to train a machine to build such models too. We design a Theory of Mind neural network – a ToMnet – which uses meta-learning to build models of the agents it encounters. The ToMnet learns a strong prior model for agents’ future behaviour, and, using only a small number of behavioural observations, can bootstrap to richer predictions about agents’ characteristics and mental states. We apply the ToMnet to agents behaving in simple gridworld environments, showing that it learns to model random, algorithmic, and deep RL agents from varied populations, and that it passes classic ToM tasks such as the “Sally-Anne” test of recognising that others can hold false beliefs about the world.

Presentations:

  • 05:00 – 05:20 PM @ A3 (Oral)
  • 06:15 – 09:00 PM @ Hall B #208 (Poster)
Path Consistency Learning in Tsallis Entropy Regularized MDPs

Authors: Ofir Nachum, Yinlam Chow, and Mohammad Ghavamzadeh

We study the sparse entropy-regularized reinforcement learning (ERL) problem in which the entropy term is a special form of the Tsallis entropy. The optimal policy of this formulation is sparse, i.e., at each state, it has non-zero probability for only a small number of actions. This addresses the main drawback of the standard Shannon entropy-regularized RL (soft ERL) formulation, in which the optimal policy is {\em softmax}, and thus, may assign a non-negligible probability mass to non-optimal actions. This problem is aggravated as the number of actions is increased. In this paper, we follow the work of Nachum et al. (2017) in the soft ERL setting, and propose a class of novel path consistency learning (PCL) algorithms, called sparse PCL, for the sparse ERL problem that can work with both on-policy and off-policy data. We first derive a sparse consistency equation that specifies a relationship between the optimal value function and policy of the sparse ERL along any system trajectory. Crucially, a weak form of the converse is also true, and we quantify the sub-optimality of a policy which satisfies sparse consistency, and show that as we increase the number of actions, this sub-optimality is better than that of the soft ERL optimal policy. We then use this result to derive the sparse PCL algorithms. We empirically compare sparse PCL with its soft counterpart, and show its advantage, especially in problems with a large number of actions.

Presentations:

  • 05:20 – 05:40 PM @ A1 (Oral)
  • 06:15 – 09:00 PM @ Hall B #172 (Poster)
Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

Authors: Andre Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Makowitz, Augustin Zidek, Remi Munos

The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Recently, a framework based on two ideas, successor features (SFs) and generalised policy improvement (GPI), has been introduced as a principled way of transferring skills. In this paper we investigate the feasibility of combining SF&GPI with the representation power of deep learning. Since in deep RL we are interested in learning all the components of SF&GPI concurrently, the existing inter-dependencies between them can lead to instabilities. In this work we propose a solution for this problem that makes it possible to use SF & GPI online, at scale. In order to empirically verify this claim, we apply the proposed method to a complex 3D environment that requires hundreds of millions of transitions to be solved. We show that the transfer promoted by SF&GPI leads to reasonable policies on unseen tasks almost instantaneously. We also show how to build on the transferred policies to learn policies that are specialised to the new tasks, which can then be added to the agent’s set of skills to be used in the future.

Presentations:

  • 05:20 – 05:40 PM @ A3 (Oral)
  • 06:15 – 09:00 PM @ Hall B #163 (Poster)
Been There, Done That: Meta-Learning with Episodic Recall

Authors: Samuel Ritter, Jane Wang, Sid Jayakumar, Zeb Kurth-Nelson, Charles Blundell, Razvan Pascanu, Matt Botvinick

Meta-learning agents have demonstrated the ability to rapidly explore and exploit new tasks sampled from the task distribution on which they were trained. However, when these agents encounter situations that they explored in the distant past, they are not able to remember the results of their past exploration. Thus, instead of immediately exploiting previously discovered solutions, they must again explore from scratch. In this work, we argue that the necessity to remember the results of past exploration is ubiquitous in naturalistic environments. We propose a formalism for modeling this kind of recurring environment structure, then develop a meta-learning architecture for solving such environments. This architecture melds the standard LSTM working memory with a differentiable neural episodic memory. We explore the capabilities of this episodic LSTM in four recurrent-state stochastic process environments: 1.) episodic contextual bandits, 2.) compositional contextual bandits, 3.) episodic two-step task, and 4.) contextual water-maze navigation.

Presentations:

  • 05:40 – 05:50 PM @ A3 (Oral)
  • 06:15 – 09:00 PM @ Hall B #209 (Poster)
Adversarial Risk and the Dangers of Evaluating Against Weak Attacks

Authors: Jonathan Uesato, Brendan O’Donoghue, Aaron van den Oord, and Pushmeet Kohli.

This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. The existence of adversarial examples in trained neural networks reflects the fact that expected risk alone does not capture the model’s performance against worst-case inputs. We motivate the use of advKarol Hausmaersarial risk as an objective, although it can not easily be computed exactly. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may be obscured to adversaries, by optimizing this surrogate rather than the true adversarial risk. We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that
our formulations and results will help researchers to develop more powerful defenses.

Presentations:

  • 05:50 – 06:00 PM @ A7 (Oral)
  • 06:15 – 09:00 PM @ Hall B #132 (Poster)

Source: https://deepmind.com/blog/announcements/deepmind-papers-icml-2018

AI

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage. Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, […]

Published

on

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage.

Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, which enables you to search and explore Hungarian cultural heritage, including 600,000 faces over 500,000 images. For example, you can find historical works by author Mór Jókai or photos on topics like weddings. The Arcanum team chose Amazon Rekognition to free valuable staff from time and cost-intensive manual labeling, and improved label accuracy to make 200,000 previously unsearchable images (approximately 40% of image inventory), available to users.

Amazon Rekognition makes it easy to add image and video analysis to your applications using highly scalable machine learning (ML) technology that requires no previous ML expertise to use. Amazon Rekognition also provides highly accurate facial recognition and facial search capabilities to detect, analyze, and compare faces.

Arcanum uses this facial recognition feature in their image database services to help you find particular people in Arcanum’s articles. This post discusses their challenges and why they chose Amazon Rekognition as their solution.

Automated image labeling challenges

Arcanum dedicated a team of three people to start tagging and labeling content for Hungaricana. The team quickly learned that they would need to invest more than 3 months of time-consuming and repetitive human labor to provide accurate search capabilities to their customers. Considering the size of the team and scope of the existing project, Arcanum needed a better solution that would automate image and object labelling at scale.

Automated image labeling solutions

To speed up and automate image labeling, Arcanum turned to Amazon Rekognition to enable users to search photos by keywords (for example, type of historic event, place name, or a person relevant to Hungarian history).

For the Hungaricana project, preprocessing all the images was challenging. Arcanum ran a TensorFlow face search across all 28 million pages on a machine with 8 GPUs in their own offices to extract only faces from images.

The following screenshot shows what an extract looks like (image provided by Arcanum Database Ltd).

The images containing only faces are sent to Amazon Rekognition, invoking the IndexFaces operation to add a face to the collection. For each face that is detected in the specified face collection, Amazon Rekognition extracts facial features into a feature vector and stores it in an Amazon Aurora database. Amazon Rekognition uses feature vectors when it performs face match and search operations using the SearchFaces and SearchFacesByImage operations.

The image preprocessing helped create a very efficient and cost-effective way to index faces. The following diagram summarizes the preprocessing workflow.

As for the web application, the workflow starts with a Hungaricana user making a face search request. The following diagram illustrates the application workflow.

The workflow includes the following steps:

  1. The user requests a facial match by uploading the image. The web request is automatically distributed by the Elastic Load Balancer to the webserver fleet.
  2. Amazon Elastic Compute Cloud (Amazon EC2) powers application servers that handle the user request.
  3. The uploaded image is stored in Amazon Simple Storage Service (Amazon S3).
  4. Amazon Rekognition indexes the face and runs SearchFaces to look for a face similar to the new face ID.
  5. The output of the search face by image operation is stored in Amazon ElastiCache, a fully managed in-memory data store.
  6. The metadata of the indexed faces are stored in an Aurora relational database built for the cloud.
  7. The resulting face thumbnails are served to the customer via the fast content-delivery network (CDN) service Amazon CloudFront.

Experimenting and live testing Hungaricana

During our test of Hungaricana, the application performed extremely well. The searches not only correctly identified people, but also provided links to all publications and sources in Arcanum’s privately owned database where found faces are present. For example, the following screenshot shows the result of the famous composer and pianist Franz Liszt.

The application provided 42 pages of 6×4 results. The results are capped to 1,000. The 100% scores are the confidence scores returned by Amazon Rekognition and are rounded up to whole numbers.

The application of Hungaricana has always promptly, and with a high degree of certainty, presented results and links to all corresponding publications.

Business results

By introducing Amazon Rekognition into their workflow, Arcanum enabled a better customer experience, including building family trees, searching for historical figures, and researching historical places and events.

The concept of face searching using artificial intelligence certainly isn’t new. But Hungaricana uses it in a very creative, unique way.

Amazon Rekognition allowed Arcanum to realize three distinct advantages:

  • Time savings – The time to market speed increased dramatically. Now, instead of spending several months of intense manual labor to label all the images, the company can do this job in a few days. Before, basic labeling on 150,000 images took months for three people to complete.
  • Cost savings – Arcanum saved around $15,000 on the Hungaricana project. Before using Amazon Rekognition, there was no automation, so a human workforce had to scan all the images. Now, employees can shift their focus to other high-value tasks.
  • Improved accuracy – Users now have a much better experience regarding hit rates. Since Arcanum started using Amazon Rekognition, the number of hits has doubled. Before, out of 500,000 images, about 200,000 weren’t searchable. But with Amazon Rekognition, search is now possible for all 500,000 images.

 “Amazon Rekognition made Hungarian culture, history, and heritage more accessible to the world,” says Előd Biszak, Arcanum CEO. “It has made research a lot easier for customers building family trees, searching for historical figures, and researching historical places and events. We cannot wait to see what the future of artificial intelligence has to offer to enrich our content further.”

Conclusion

In this post, you learned how to add highly scalable face and image analysis to an enterprise-level image gallery to improve label accuracy, reduce costs, and save time.

You can test Amazon Rekognition features such as facial analysis, face comparison, or celebrity recognition on images specific to your use case on the Amazon Rekognition console.

For video presentations and tutorials, see Getting Started with Amazon Rekognition. For more information about Amazon Rekognition, see Amazon Rekognition Documentation.


About the Authors

Siniša Mikašinović is a Senior Solutions Architect at AWS Luxembourg, covering Central and Eastern Europe—a region full of opportunities, talented and innovative developers, ISVs, and startups. He helps customers adopt AWS services as well as acquire new skills, learn best practices, and succeed globally with the power of AWS. His areas of expertise are Game Tech and Microsoft on AWS. Siniša is a PowerShell enthusiast, a gamer, and a father of a small and very loud boy. He flies under the flags of Croatia and Serbia.

Cameron Peron is Senior Marketing Manager for AWS Amazon Rekognition and the AWS AI/ML community. He evangelizes how AI/ML innovation solves complex challenges facing community, enterprise, and startups alike. Out of the office, he enjoys staying active with kettlebell-sport, spending time with his family and friends, and is an avid fan of Euro-league basketball.

Source: https://aws.amazon.com/blogs/machine-learning/arcanum-makes-hungarian-heritage-accessible-with-amazon-rekognition/

Continue Reading

AI

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage. Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, […]

Published

on

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage.

Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, which enables you to search and explore Hungarian cultural heritage, including 600,000 faces over 500,000 images. For example, you can find historical works by author Mór Jókai or photos on topics like weddings. The Arcanum team chose Amazon Rekognition to free valuable staff from time and cost-intensive manual labeling, and improved label accuracy to make 200,000 previously unsearchable images (approximately 40% of image inventory), available to users.

Amazon Rekognition makes it easy to add image and video analysis to your applications using highly scalable machine learning (ML) technology that requires no previous ML expertise to use. Amazon Rekognition also provides highly accurate facial recognition and facial search capabilities to detect, analyze, and compare faces.

Arcanum uses this facial recognition feature in their image database services to help you find particular people in Arcanum’s articles. This post discusses their challenges and why they chose Amazon Rekognition as their solution.

Automated image labeling challenges

Arcanum dedicated a team of three people to start tagging and labeling content for Hungaricana. The team quickly learned that they would need to invest more than 3 months of time-consuming and repetitive human labor to provide accurate search capabilities to their customers. Considering the size of the team and scope of the existing project, Arcanum needed a better solution that would automate image and object labelling at scale.

Automated image labeling solutions

To speed up and automate image labeling, Arcanum turned to Amazon Rekognition to enable users to search photos by keywords (for example, type of historic event, place name, or a person relevant to Hungarian history).

For the Hungaricana project, preprocessing all the images was challenging. Arcanum ran a TensorFlow face search across all 28 million pages on a machine with 8 GPUs in their own offices to extract only faces from images.

The following screenshot shows what an extract looks like (image provided by Arcanum Database Ltd).

The images containing only faces are sent to Amazon Rekognition, invoking the IndexFaces operation to add a face to the collection. For each face that is detected in the specified face collection, Amazon Rekognition extracts facial features into a feature vector and stores it in an Amazon Aurora database. Amazon Rekognition uses feature vectors when it performs face match and search operations using the SearchFaces and SearchFacesByImage operations.

The image preprocessing helped create a very efficient and cost-effective way to index faces. The following diagram summarizes the preprocessing workflow.

As for the web application, the workflow starts with a Hungaricana user making a face search request. The following diagram illustrates the application workflow.

The workflow includes the following steps:

  1. The user requests a facial match by uploading the image. The web request is automatically distributed by the Elastic Load Balancer to the webserver fleet.
  2. Amazon Elastic Compute Cloud (Amazon EC2) powers application servers that handle the user request.
  3. The uploaded image is stored in Amazon Simple Storage Service (Amazon S3).
  4. Amazon Rekognition indexes the face and runs SearchFaces to look for a face similar to the new face ID.
  5. The output of the search face by image operation is stored in Amazon ElastiCache, a fully managed in-memory data store.
  6. The metadata of the indexed faces are stored in an Aurora relational database built for the cloud.
  7. The resulting face thumbnails are served to the customer via the fast content-delivery network (CDN) service Amazon CloudFront.

Experimenting and live testing Hungaricana

During our test of Hungaricana, the application performed extremely well. The searches not only correctly identified people, but also provided links to all publications and sources in Arcanum’s privately owned database where found faces are present. For example, the following screenshot shows the result of the famous composer and pianist Franz Liszt.

The application provided 42 pages of 6×4 results. The results are capped to 1,000. The 100% scores are the confidence scores returned by Amazon Rekognition and are rounded up to whole numbers.

The application of Hungaricana has always promptly, and with a high degree of certainty, presented results and links to all corresponding publications.

Business results

By introducing Amazon Rekognition into their workflow, Arcanum enabled a better customer experience, including building family trees, searching for historical figures, and researching historical places and events.

The concept of face searching using artificial intelligence certainly isn’t new. But Hungaricana uses it in a very creative, unique way.

Amazon Rekognition allowed Arcanum to realize three distinct advantages:

  • Time savings – The time to market speed increased dramatically. Now, instead of spending several months of intense manual labor to label all the images, the company can do this job in a few days. Before, basic labeling on 150,000 images took months for three people to complete.
  • Cost savings – Arcanum saved around $15,000 on the Hungaricana project. Before using Amazon Rekognition, there was no automation, so a human workforce had to scan all the images. Now, employees can shift their focus to other high-value tasks.
  • Improved accuracy – Users now have a much better experience regarding hit rates. Since Arcanum started using Amazon Rekognition, the number of hits has doubled. Before, out of 500,000 images, about 200,000 weren’t searchable. But with Amazon Rekognition, search is now possible for all 500,000 images.

 “Amazon Rekognition made Hungarian culture, history, and heritage more accessible to the world,” says Előd Biszak, Arcanum CEO. “It has made research a lot easier for customers building family trees, searching for historical figures, and researching historical places and events. We cannot wait to see what the future of artificial intelligence has to offer to enrich our content further.”

Conclusion

In this post, you learned how to add highly scalable face and image analysis to an enterprise-level image gallery to improve label accuracy, reduce costs, and save time.

You can test Amazon Rekognition features such as facial analysis, face comparison, or celebrity recognition on images specific to your use case on the Amazon Rekognition console.

For video presentations and tutorials, see Getting Started with Amazon Rekognition. For more information about Amazon Rekognition, see Amazon Rekognition Documentation.


About the Authors

Siniša Mikašinović is a Senior Solutions Architect at AWS Luxembourg, covering Central and Eastern Europe—a region full of opportunities, talented and innovative developers, ISVs, and startups. He helps customers adopt AWS services as well as acquire new skills, learn best practices, and succeed globally with the power of AWS. His areas of expertise are Game Tech and Microsoft on AWS. Siniša is a PowerShell enthusiast, a gamer, and a father of a small and very loud boy. He flies under the flags of Croatia and Serbia.

Cameron Peron is Senior Marketing Manager for AWS Amazon Rekognition and the AWS AI/ML community. He evangelizes how AI/ML innovation solves complex challenges facing community, enterprise, and startups alike. Out of the office, he enjoys staying active with kettlebell-sport, spending time with his family and friends, and is an avid fan of Euro-league basketball.

Source: https://aws.amazon.com/blogs/machine-learning/arcanum-makes-hungarian-heritage-accessible-with-amazon-rekognition/

Continue Reading

AI

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage. Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, […]

Published

on

Arcanum specializes in digitizing Hungarian language content, including newspapers, books, maps, and art. With over 30 years of experience, Arcanum serves more than 30,000 global subscribers with access to Hungarian culture, history, and heritage.

Amazon Rekognition Solutions Architects worked with Arcanum to add highly scalable image analysis to Hungaricana, a free service provided by Arcanum, which enables you to search and explore Hungarian cultural heritage, including 600,000 faces over 500,000 images. For example, you can find historical works by author Mór Jókai or photos on topics like weddings. The Arcanum team chose Amazon Rekognition to free valuable staff from time and cost-intensive manual labeling, and improved label accuracy to make 200,000 previously unsearchable images (approximately 40% of image inventory), available to users.

Amazon Rekognition makes it easy to add image and video analysis to your applications using highly scalable machine learning (ML) technology that requires no previous ML expertise to use. Amazon Rekognition also provides highly accurate facial recognition and facial search capabilities to detect, analyze, and compare faces.

Arcanum uses this facial recognition feature in their image database services to help you find particular people in Arcanum’s articles. This post discusses their challenges and why they chose Amazon Rekognition as their solution.

Automated image labeling challenges

Arcanum dedicated a team of three people to start tagging and labeling content for Hungaricana. The team quickly learned that they would need to invest more than 3 months of time-consuming and repetitive human labor to provide accurate search capabilities to their customers. Considering the size of the team and scope of the existing project, Arcanum needed a better solution that would automate image and object labelling at scale.

Automated image labeling solutions

To speed up and automate image labeling, Arcanum turned to Amazon Rekognition to enable users to search photos by keywords (for example, type of historic event, place name, or a person relevant to Hungarian history).

For the Hungaricana project, preprocessing all the images was challenging. Arcanum ran a TensorFlow face search across all 28 million pages on a machine with 8 GPUs in their own offices to extract only faces from images.

The following screenshot shows what an extract looks like (image provided by Arcanum Database Ltd).

The images containing only faces are sent to Amazon Rekognition, invoking the IndexFaces operation to add a face to the collection. For each face that is detected in the specified face collection, Amazon Rekognition extracts facial features into a feature vector and stores it in an Amazon Aurora database. Amazon Rekognition uses feature vectors when it performs face match and search operations using the SearchFaces and SearchFacesByImage operations.

The image preprocessing helped create a very efficient and cost-effective way to index faces. The following diagram summarizes the preprocessing workflow.

As for the web application, the workflow starts with a Hungaricana user making a face search request. The following diagram illustrates the application workflow.

The workflow includes the following steps:

  1. The user requests a facial match by uploading the image. The web request is automatically distributed by the Elastic Load Balancer to the webserver fleet.
  2. Amazon Elastic Compute Cloud (Amazon EC2) powers application servers that handle the user request.
  3. The uploaded image is stored in Amazon Simple Storage Service (Amazon S3).
  4. Amazon Rekognition indexes the face and runs SearchFaces to look for a face similar to the new face ID.
  5. The output of the search face by image operation is stored in Amazon ElastiCache, a fully managed in-memory data store.
  6. The metadata of the indexed faces are stored in an Aurora relational database built for the cloud.
  7. The resulting face thumbnails are served to the customer via the fast content-delivery network (CDN) service Amazon CloudFront.

Experimenting and live testing Hungaricana

During our test of Hungaricana, the application performed extremely well. The searches not only correctly identified people, but also provided links to all publications and sources in Arcanum’s privately owned database where found faces are present. For example, the following screenshot shows the result of the famous composer and pianist Franz Liszt.

The application provided 42 pages of 6×4 results. The results are capped to 1,000. The 100% scores are the confidence scores returned by Amazon Rekognition and are rounded up to whole numbers.

The application of Hungaricana has always promptly, and with a high degree of certainty, presented results and links to all corresponding publications.

Business results

By introducing Amazon Rekognition into their workflow, Arcanum enabled a better customer experience, including building family trees, searching for historical figures, and researching historical places and events.

The concept of face searching using artificial intelligence certainly isn’t new. But Hungaricana uses it in a very creative, unique way.

Amazon Rekognition allowed Arcanum to realize three distinct advantages:

  • Time savings – The time to market speed increased dramatically. Now, instead of spending several months of intense manual labor to label all the images, the company can do this job in a few days. Before, basic labeling on 150,000 images took months for three people to complete.
  • Cost savings – Arcanum saved around $15,000 on the Hungaricana project. Before using Amazon Rekognition, there was no automation, so a human workforce had to scan all the images. Now, employees can shift their focus to other high-value tasks.
  • Improved accuracy – Users now have a much better experience regarding hit rates. Since Arcanum started using Amazon Rekognition, the number of hits has doubled. Before, out of 500,000 images, about 200,000 weren’t searchable. But with Amazon Rekognition, search is now possible for all 500,000 images.

 “Amazon Rekognition made Hungarian culture, history, and heritage more accessible to the world,” says Előd Biszak, Arcanum CEO. “It has made research a lot easier for customers building family trees, searching for historical figures, and researching historical places and events. We cannot wait to see what the future of artificial intelligence has to offer to enrich our content further.”

Conclusion

In this post, you learned how to add highly scalable face and image analysis to an enterprise-level image gallery to improve label accuracy, reduce costs, and save time.

You can test Amazon Rekognition features such as facial analysis, face comparison, or celebrity recognition on images specific to your use case on the Amazon Rekognition console.

For video presentations and tutorials, see Getting Started with Amazon Rekognition. For more information about Amazon Rekognition, see Amazon Rekognition Documentation.


About the Authors

Siniša Mikašinović is a Senior Solutions Architect at AWS Luxembourg, covering Central and Eastern Europe—a region full of opportunities, talented and innovative developers, ISVs, and startups. He helps customers adopt AWS services as well as acquire new skills, learn best practices, and succeed globally with the power of AWS. His areas of expertise are Game Tech and Microsoft on AWS. Siniša is a PowerShell enthusiast, a gamer, and a father of a small and very loud boy. He flies under the flags of Croatia and Serbia.

Cameron Peron is Senior Marketing Manager for AWS Amazon Rekognition and the AWS AI/ML community. He evangelizes how AI/ML innovation solves complex challenges facing community, enterprise, and startups alike. Out of the office, he enjoys staying active with kettlebell-sport, spending time with his family and friends, and is an avid fan of Euro-league basketball.

Source: https://aws.amazon.com/blogs/machine-learning/arcanum-makes-hungarian-heritage-accessible-with-amazon-rekognition/

Continue Reading
AI7 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI7 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI7 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI7 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI7 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI7 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI7 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI7 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI7 hours ago

Arcanum makes Hungarian heritage accessible with Amazon Rekognition

AI11 hours ago

Pros and Cons of using cloud platforms for building chatbots

AI11 hours ago

From Knowledge Databases To Knowledge Graphs And Conversational AI

AI12 hours ago

Model selection with cross-validation: A quest for an elite model

AI12 hours ago

Celebrating 10 Years of Innovation, Excellence, and Trust

AI22 hours ago

Executive Interview: Brian Gattoni, CTO, Cybersecurity & Infrastructure Security Agency 

AI22 hours ago

Making Use Of AI Ethics Tuning Knobs In AI Autonomous Cars 

AI22 hours ago

Application of AI to IT Service Ops by IBM and ServiceNow Exemplifies a Trend 

AI23 hours ago

Testing Finds Automated Driver Assistance Systems to be Unreliable 

AI23 hours ago

How  Veterans Would Study Machine Learning If He Had to Start Today 

AI23 hours ago

Forecasting for Fall Uncertainties 

AI1 day ago

Securing Amazon SageMaker Studio connectivity using a private VPC

Trending