Connect with us

AI

The 4 Steps to Build Out Your Machine Learning Team Productively

Over the past few years, machine learning has grown tremendously. But as young as machine learning is as a discipline, the craft of managing a machine learning team is even younger. Many of today’s machine learning mangers were thrust into management roles out of necessity or because they were the best individual contributors, and many […]

The post The 4 Steps to Build Out Your Machine Learning Team Productively appeared first on TOPBOTS.

Published

on

Over the past few years, machine learning has grown tremendously. But as young as machine learning is as a discipline, the craft of managing a machine learning team is even younger. Many of today’s machine learning mangers were thrust into management roles out of necessity or because they were the best individual contributors, and many come from purely academic backgrounds. At some companies, engineering or product leaders are being tasked with building new machine learning functions without any real machine learning experience.

Running any technical team is hard:

  • You have to hire great people.
  • You need to manage and develop them.
  • You need to manage your team’s output and make sure your vectors are aligned.
  • You would want to make good long-term technical choices and manage technical debt.
  • You also must manage expectations from leadership.

Running a Machine Learning team is even harder:

  • Machine Learning talents are expensive and scarce.
  • Machine Learning teams have a diverse set of roles.
  • Machine Learning projects have unclear timelines and high uncertainty.
  • Machine Learning is also the “high-interest credit card of technical debt.”
  • Leadership often doesn’t understand Machine Learning.

I recently attended the Full-Stack Deep Learning Bootcamp in the UC Berkeley campus, which is a wonderful course that teaches full-stack production deep learning. One of the lectures delivered by Josh Tobin provided the best practices surrounding Machine Learning teams. As a courtesy of Josh’s lecture, this article will give you some insight into how to think about building and managing Machine Learning teams if you are a manager, and also possibly help you get a job in Machine Learning if you are a job seeker.

Note: You can also watch this lecture from Josh’s talks at the FSDL March 2019 version and the Applied Deep Learning Fellowship held at Weights & Biases.

Step 1 — Defining The Roles

Let’s take a look at the most common Machine Learning roles and the skills they require:

  1. The Machine Learning Product Manager is someone who works with the Machine Learning team, as well as other business functions and the end-users. This person designs docs, creates wireframes, comes up with the plan to prioritize and execute Machine Learning projects.
  2. The DevOps Engineer is someone who deploys and monitors production systems. This person handles the infrastructure that runs the deployed Machine Learning product.
  3. The Data Engineer is someone who builds data pipelines, aggregates and collects from data storage, monitors data behavior… This person works with distributed systems such as Hadoop, Kafka, Airflow.
  4. The Machine Learning Engineer is someone who trains and deploys prediction models. This person uses tools like TensorFlow and Docker to work with prediction systems running on real data in production.
  5. The Machine Learning Researcher is someone who trains prediction models, but often forward-looking or not production-critical. This person uses TensorFlow / PyTorch / Jupiter to build models and reports describing their experiments.
  6. The Data Scientist is actually a blanket term used to describe all of the roles above. In some organizations, this role actually entails answering business questions via analytics.
machine learning jobs

Josh Tobin at FSDL Bootcamp Nov 2019 (https://fullstackdeeplearning.com/november2019/)

So what skills are needed for these roles? The chart above displays a nice visual, where the horizontal axis is the level of Machine Learning expertise and the size of the bubble is the level of communication and technical writing (the bigger the better).

  1. The Machine Learning DevOps is primarily a software engineering role, which often comes from a standard software engineering pipeline.
  2. The Data Engineer belongs to the software engineering team that works actively with Machine Learning teams.
  3. The Machine Learning Engineer requires a rare mix of Machine Learning and Software Engineering skills. This person is either an engineer with significant self-teaching OR a science/engineering Ph.D. who works as a traditional software engineer after graduate school
  4. The Machine Learning Researcher is a Machine Learning expert who usually has an MS or Ph.D. degree in Computer Science or Statistics or finishes an industrial fellowship program.
  5. The Machine Learning Product Manager is just like a traditional Product Manager, but with a deep knowledge of the Machine Learning development process and mindset.
  6. The Data Scientist role constitutes a wide range of backgrounds from undergraduate to Ph.D. students.

Step 2 — Structuring The Team

There exists not yet a consensus on the right way to structure a Machine Learning team, but there are a few best practices that are contingent upon different organization archetypes and their Machine Learning maturity level. First, let’s see what are the different Machine Learning organization archetypes.

Archetype 1 — Nascent and Ad-Hoc ML

  • These are orgs in which no one is doing Machine Learning, or Machine Learning is done on an ad-hoc basis. Obviously, there is little Machine Learning expertise in-house.
  • These are either small-to-medium businesses or less technology-forward large companies in industries like education or logistics.
  • There is often low-hanging fruit for Machine Learning.
  • But there is little support for Machine Learning projects and it’s very difficult to hire and retain good talent.

Archetype 2 — Research and Development ML

  • These are orgs in which Machine Learning efforts are centered in the R&D arm of the organization. They often hire Machine Learning researchers and doctorate students with experience publishing papers.
  • These are larger companies in sectors such as oil and gas, manufacturing, or telecommunications.
  • They can hire experienced researchers and work on long-term business priorities to get big wins.
  • However, it is very difficult to get quality data. Most often, this type of research work rarely translates into actual business value, so usually the amount of investment remains small.

Archetype 3 — Product Embedded ML

  • These are orgs in which certain product teams or business units have Machine Learning expertise along-side their software or analytics talent. These Machine Learning individuals report up to the team’s engineering/tech lead.
  • These are either software companies or financial services companies.
  • Machine Learning improvements are likely to lead to business value. Furthermore, there is a tight feedback cycle between idea iteration and product improvement.
  • Unfortunately, it is still very hard to hire and develop top talent, and access to data & compute resources can lag. There are also potential conflicts between Machine Learning project cycles and engineering management, so long-term Machine Learning projects can be hard to justify.

Archetype 4 — Independent ML Division

  • These are orgs in which Machine Learning division reports directly to senior leadership. The Machine Learning Product Managers work with Researchers and Engineers to build Machine Learning into client-facing products. They can sometimes publish long-term research.
  • These are often large financial services companies.
  • Talent density allows them to hire and train top practitioners. Senior leaders can marshal data and compute resources. This gives the organizations to invest in tooling, practices, and culture around Machine Learning development.
  • A disadvantage is that model handoffs to different business lines can be challenging, since users need to buy-in to Machine Learning benefits and get educated on the model use. Also, feedback cycles can be slow.

Archetype 5 — ML-First

  • These are orgs in which the CEO invests in Machine Learning and there are experts across the business focusing on quick wins. The Machine Learning division works on challenging and long-term projects.
  • This group includes large tech companies and Machine Learning-focused startups.
  • They have the best data access (data thinking permeates the organization), the most attractive recruiting funnel (challenging Machine Learning problems tends to attract top talent), and the easiest deployment procedure (product teams understand Machine Learning well enough).
  • This type of organization archetype is hard to implement in practice since it is culturally difficult to embed Machine Learning thinking everywhere.

machine learning jobs

Depending on the above archetype that your organization resembles, you can make the appropriate design choices, which broadly speaking follow these 3 categories:

  • Software Engineer vs Research: To what extent is the Machine Learning team responsible for building or integrating with software? How important are Software Engineering skills on the team?
  • Data Ownership: How much control does the Machine Learning team have over data collection, warehousing, labeling, and pipelining?
  • Model Ownership: Is the Machine Learning team responsible for deploying models into production? Who maintains the deployed models?

Below are the design suggestions…

If your organization focuses on Machine Learning R&D:

  • Research is most definitely prioritized over Software Engineering skills. Because of this, there would potentially be a lacking of collaboration between these two groups.
  • Machine Learning team has no control over the data and typically will not have data engineers to support them.
  • Machine Learning models are rarely deployed into production.

If your organization has Machine Learning embedded into the product:

  • Software Engineering skills will be prioritized over Research skills. Often, the researchers would need strong engineering skills since everyone would be expected to product-ionize his/her models.
  • Machine Learning team generally does not own data production and data management. They will need to work with data engineers to build data pipelines.
  • Machine Learning engineers totally own the models that they deploy into production.

If your organization has an independent Machine Learning division:

  • Each team has a strong mix of engineering and research skills; therefore they work closely together within teams.
  • Machine Learning team has a voice in data governance discussions, as well as a strong data engineering function.
  • Machine Learning team hands-off models to user, but is still responsible for maintaining them.

If your organization is Machine Learning-First:

  • Different teams are more or less research-oriented; but in general, research teams collaborate closely with engineering teams.
  • Machine Learning team often owns the company-wide data infrastructure.
  • Machine Learning team hands-off models to user, who operates and maintains them.

The picture below neatly sums up these suggestions:

machine learning jobs

Josh Tobin at FSDL Bootcamp Nov 2019 (https://fullstackdeeplearning.com/november2019/)

Step 3 — Managing The Projects

Manage Machine Learning projects can be very challenging:

  • According to Lukas Biewaldit is hard to tell in advance what’s hard and what’s easy. Even within a domain, performance can vary wildly.
  • Machine Learning progress is nonlinear. It is very common for projects to stall for weeks or longer. In the early stages, it is difficult to plan a project because it’s unclear whether what will work. As a result, estimating Machine Learning project timelines is extremely difficult.
  • There are cultural gaps between research and engineering because of different values, backgrounds, goals, and norms. In toxic cultures, the two sides often don’t value one another.
  • And often, leadership just does not understand it.

So how can you manage Machine Learning teams better? The secret sauce is to plan the Machine Learning project probabilistically!

In essence, going from this:

machine learning jobs

To this:

machine learning jobs

Here are some other good practices:

  • You should attempt a portfolio of approaches.
  • You should measure progress based on inputs, not results.
  • You should have researchers and engineers work together.
  • You should get end-to-end pipelines together quickly to demonstrate quick wins.
  • You should educate leadership on Machine Learning timeline uncertainty.

Step 4 — Hiring The Talent

According to this 2019 Global AI Talent Report from Element AI, there is strong evidence that the supply of top-tier AI talent does not meet the demand. There were about 22,000 people at the cutting edge of AI research who are actively publishing papers and presenting at academic conferences. Only around 4,000 people contributed to research that had a major impact on the overall field. A total of 36,500 people qualified as self-reported AI specialists. Compared this to the number of software developers, which is 4.2 million in the US and 26.4 million in the world.

1 — How To Source Machine Learning Talent?

Here are some strategies to hire Machine Learning Engineers:

  • Hire people for their software engineering skills, keen interest in Machine Learning, and a desire to learn. You can then train them to do Machine Learning.
  • Go for junior roles, considering that most undergraduate Computer Science students these days graduate with Machine Learning experience.
  • Be really specific about what you need. For example, not every Machine Learning engineer needs to do DevOps.

And here are strategies to hire Machine Learning Researchers:

  • Look for the quality of publications, not the quantity (e.g., originality of ideas, quality of execution)
  • Look for researchers with an eye for working on important problems. Many researchers focus on trendy problems without considering why they matter.
  • Look for researchers with experience outside of academia.
  • Consider hiring talented people from adjacent fields such as math, physics, and stats.
  • Consider hiring people without Ph.D. degrees. For example, talented undergraduate and Master’s students, graduates of industrial fellowship programs (Google, Facebook, OpenAI), and even dedicated self-studiers.

How do you find these candidates in the first place?

  • There are standard sources such as LinkedIn, using a recruiting agency, and visiting universities’ career fairs.
  • You should attend well-known Machine Learning research conferences (NeurIPS, ICLR, ICML) for Machine Learning Researchers and well-known Applied Machine Learning conferences (O’Reilly, ReWork, TensorFlow World) for Machine Learning Engineers.
  • You can monitor ArXiv for impressive research papers and contact the first authors.

machine learning jobs

For a long-term strategy, you would want to think about how to attract these candidates and make your organization stand out:

  • Since Machine Learning practitioners want to work with cutting edge tools and techniques, your company should work on research-oriented projects, publicize them with blog posts, and invest in tooling & infrastructure for your Machine Learning team.
  • Since Machine Learning practitioners want to build skills and knowledge in an exciting field, your company should build a team culture around learning (i.e. reading groups, learning days, professional development budget, conference budget).
  • Since Machine Learning practitioners want to work with excellent people, your company should hire high-profile people and/or help your best people build their profile through publishing blogs and papers.
  • Since Machine Learning practitioners want to work on interesting datasets, your company should sell the uniqueness of your dataset in recruiting materials.
  • Since Machine Learning practitioners want to do work that matters, your company should sell the mission of your company and the potential impact of Machine Learning on that mission. More importantly, you should work on projects that have a tangible impact today.

2 — How To Interview Machine Learning Candidates?

So what should you test for in a Machine Learning interview?

  • The first thing is to validate your hypotheses of the candidate’s strengths. For Machine Learning Researchers, make sure that they can think creatively about new Machine Learning problems and probe how thoughtful they were about previous projects. For Machine Learning Engineers, make sure they are great generalists with solid engineering skills.
  • The second thing is to ensure that the candidates meet a minimum bar on weaker areas. For Machine Learning Researchers, test them engineering knowledge and ability to write good code. For Machine Learning Engineers, test them simple Machine Learning knowledge.

The Machine Learning interview is much less well-defined than a traditional software engineering interview, but here are common types of assessments:

  • Background and culture fit
  • Whiteboard coding
  • Pair coding / debugging (often Machine Learning-specific code)
  • Math puzzles
  • Take-home project
  • Applied Machine Learning (e.g., explain how to solve a problem with Machine Learning)
  • Previous projects (methodologies, trials and errors, findings)
  • Machine Learning theory (e.g., bias-variance tradeoff, overfitting and underfitting, specific algorithms…)

machine learning jobs

3 — How To Find A Job As a Machine Learning Practitioner?

Let’s say you are a Machine Learning candidate reading this article. You might ask: “Where do I look for a Machine Learning job?

  • Again, there are standard sources like LinkedIn, recruiters, and on-campus recruiting.
  • You can attend Machine Learning research conferences and network with people there.
  • You can also just apply directly to the companies’ portal (remember that there’s a talent gap!)

The job search is certainly not easy, but there are a couple of ways to stand out:

  • Build general software engineering skills (via CS classes and/or work experience).
  • Exhibit interest in Machine Learning (via attending conferences and/or taking MOOCs).
  • Show that you have a broad knowledge of Machine Learning (e.g., write blog posts synthesizing a research area).
  • Demonstrate the ability to get Machine Learning projects done (e.g., create side projects and/or reimplement papers).
  • Prove you can think creatively in Machine Learning (e.g., win Kaggle competitions and/or publish papers).

In order to prepare for the interview, you should:

  • Practice for a general software engineering interview with resources like Cracking The Coding Interview.
  • Prepare to talk in detail about your past projects, including the tradeoffs and decisions you made.
  • Review Machine Learning theory and basic Machine Learning algorithms.
  • Think creatively about how to use Machine Learning to solve the problems that the company you’re interviewing with might face.

I would also recommend checking out this slide from Chip Huyen delivered at the Bootcamp, which includes some great lessons from both sides of the Machine Learning interview process.

Conclusion

Being a new and evolving discipline for most traditional organizations, forming machine learning teams is full of known and unknown challenges. If you skipped to the end, here are the final few take-homes:

  • There are lots of different skills involved in production Machine Learning, so there are opportunities for many people to contribute.
  • Machine Learning teams are becoming more standalone and more interdisciplinary.
  • Managing Machine Learning teams is hard. There is no silver bullet, but shifting toward probabilistic planning can help.
  • Machine Learning talent is scarce. As a manager, be specific about what skills are must-have in the Machine Learning job descriptions. As a job seeker, it can be brutally challenging to break in as an outsider, so use projects as a signal to build awareness.

Hopefully, this post has presented helpful information for you to build out machine learning teams productively. In the upcoming blog posts, I will share more lessons that I learned from attending the Full-Stack Deep Learning Bootcamp, so stay tuned!

This article was originally published on Medium and re-published to TOPBOTS with permission from the author.

Are You Recruiting AI Talent?

Sign up below to receive our upcoming content on recruiting AI & ML professionals.

AI

How does it know?! Some beginner chatbot tech for newbies.

Published

on

Wouter S. Sligter

Most people will know by now what a chatbot or conversational AI is. But how does one design and build an intelligent chatbot? Let’s investigate some essential concepts in bot design: intents, context, flows and pages.

I like using Google’s Dialogflow platform for my intelligent assistants. Dialogflow has a very accurate NLP engine at a cost structure that is extremely competitive. In Dialogflow there are roughly two ways to build the bot tech. One is through intents and context, the other is by means of flows and pages. Both of these design approaches have their own version of Dialogflow: “ES” and “CX”.

Dialogflow ES is the older version of the Dialogflow platform which works with intents, context and entities. Slot filling and fulfillment also help manage the conversation flow. Here are Google’s docs on these concepts: https://cloud.google.com/dialogflow/es/docs/concepts

Context is what distinguishes ES from CX. It’s a way to understand where the conversation is headed. Here’s a diagram that may help understand how context works. Each phrase that you type triggers an intent in Dialogflow. Each response by the bot happens after your message has triggered the most likely intent. It’s Dialogflow’s NLP engine that decides which intent best matches your message.

Wouter Sligter, 2020

What’s funny is that even though you typed ‘yes’ in exactly the same way twice, the bot gave you different answers. There are two intents that have been programmed to respond to ‘yes’, but only one of them is selected. This is how we control the flow of a conversation by using context in Dialogflow ES.

Unfortunately the way we program context into a bot on Dialogflow ES is not supported by any visual tools like the diagram above. Instead we need to type this context in each intent without seeing the connection to other intents. This makes the creation of complex bots quite tedious and that’s why we map out the design of our bots in other tools before we start building in ES.

The newer Dialogflow CX allows for a more advanced way of managing the conversation. By adding flows and pages as additional control tools we can now visualize and control conversations easily within the CX platform.

source: https://cloud.google.com/dialogflow/cx/docs/basics

This entire diagram is a ‘flow’ and the blue blocks are ‘pages’. This visualization shows how we create bots in Dialogflow CX. It’s immediately clear how the different pages are related and how the user will move between parts of the conversation. Visuals like this are completely absent in Dialogflow ES.

It then makes sense to use different flows for different conversation paths. A possible distinction in flows might be “ordering” (as seen here), “FAQs” and “promotions”. Structuring bots through flows and pages is a great way to handle complex bots and the visual UI in CX makes it even better.

At the time of writing (October 2020) Dialogflow CX only supports English NLP and its pricing model is surprisingly steep compared to ES. But bots are becoming critical tech for an increasing number of companies and the cost reductions and quality of conversations are enormous. Building and managing bots is in many cases an ongoing task rather than a single, rounded-off project. For these reasons it makes total sense to invest in a tool that can handle increasing complexity in an easy-to-use UI such as Dialogflow CX.

This article aims to give insight into the tech behind bot creation and Dialogflow is used merely as an example. To understand how I can help you build or manage your conversational assistant on the platform of your choice, please contact me on LinkedIn.

Source: https://chatbotslife.com/how-does-it-know-some-beginner-chatbot-tech-for-newbies-fa75ff59651f?source=rss—-a49517e4c30b—4

Continue Reading

AI

Who is chatbot Eliza?

Between 1964 and 1966 Eliza was born, one of the very first conversational agents. Discover the whole story.

Published

on


Frédéric Pierron

Between 1964 and 1966 Eliza was born, one of the very first conversational agents. Its creator, Joseph Weizenbaum was a researcher at the famous Artificial Intelligence Laboratory of the MIT (Massachusetts Institute of Technology). His goal was to enable a conversation between a computer and a human user. More precisely, the program simulates a conversation with a Rogérian psychoanalyst, whose method consists in reformulating the patient’s words to let him explore his thoughts himself.

Joseph Weizenbaum (Professor emeritus of computer science at MIT). Location: Balcony of his apartment in Berlin, Germany. By Ulrich Hansen, Germany (Journalist) / Wikipedia.

The program was rather rudimentary at the time. It consists in recognizing key words or expressions and displaying in return questions constructed from these key words. When the program does not have an answer available, it displays a “I understand” that is quite effective, albeit laconic.

Weizenbaum explains that his primary intention was to show the superficiality of communication between a human and a machine. He was very surprised when he realized that many users were getting caught up in the game, completely forgetting that the program was without real intelligence and devoid of any feelings and emotions. He even said that his secretary would discreetly consult Eliza to solve his personal problems, forcing the researcher to unplug the program.

Conversing with a computer thinking it is a human being is one of the criteria of Turing’s famous test. Artificial intelligence is said to exist when a human cannot discern whether or not the interlocutor is human. Eliza, in this sense, passes the test brilliantly according to its users.
Eliza thus opened the way (or the voice!) to what has been called chatbots, an abbreviation of chatterbot, itself an abbreviation of chatter robot, literally “talking robot”.

Source: https://chatbotslife.com/who-is-chatbot-eliza-bfeef79df804?source=rss—-a49517e4c30b—4

Continue Reading

AI

How to take S3 backups with DejaDup on Ubuntu 20.10

DejaDup is the default backup application for Gnome. It’s a GUI for duplicity, focuses on simplicity, supports incremental encrypted backups and up until recently supported a large number of cloud providers. Unfortunately as of version 42.0, all major cloud providers have been removed. Thus given that Ubuntu 20.10 ships with the specific version, any user […]

Published

on

DejaDup is the default backup application for Gnome. It’s a GUI for duplicity, focuses on simplicity, supports incremental encrypted backups and up until recently supported a large number of cloud providers. Unfortunately as of version 42.0, all major cloud providers have been removed. Thus given that Ubuntu 20.10 ships with the specific version, any user who upgrades and has backups on Amazon S3 won’t be able to access them. In this blog post, we will provide a solution that will allow you to continue taking backups on AWS S3 using DejaDup.

The mandatory rant (feel free to skip)

The removal of the cloud providers should not come as a surprise. I’m not exactly sure which version of DejaDup deprecated them but it was around the release of Ubuntu 17.10 when they were all hidden as an option. So for 3 long years, people who had backups on Amazon S3, Google Cloud Storage, Openstack Swift, Rackspace etc could still use the deprecated feature and prepare for the inevitable removal.

So why complain you might ask? Well, first of all, when you update from an earlier version of Ubuntu to 20.10, you don’t really know that the all cloud providers are removed from DejaDup. Hence if something goes wrong during the update, you won’t be able to easily access your backups and restore your system.

Another big problem is the lack of storage options on the last version of DejaDup. They decided to change their policy and support only “consumer-targeted cloud services” but currently they only support Google Drive. So they eliminated all the cost efficient options for mass storage and kept only one single very expensive option. I’m not really sure how this is good for the users of the application. Linux was always about having a choice (too much of it in many cases), so why not maintain multiple storage options to serve both the experience and inexperienced users? Thankfully because we are on Linux, we have option to fix this.

How to use Deja Dup v42+ with AWS S3

WARNING: I have not tested thoroughly the following setup so use it at your own risk. If the computer explodes in your face, you lose your data, or your spouse takes your kids and leaves you, don’t blame me.

Installing s3fs fuse

With that out of the way, let’s proceed to the fix. We will use s3fs fuse, a program that allows you to mount an S3 bucket via FUSE and effectively make it look like a local disk. Thankfully you don’t have to compile it from source as it’s on Ubuntu’s repos. To install it, type the following in your terminal:

sudo apt install s3fs

Setting up your AWS credentials file

Next, we need to configure your credentials. The s3fs supports two methods for authentication: an AWS credential file or a custom passwd file. In this tutorial we will use the first method but if you are interested for the latter feel free to view the s3fs documentation on Github. To setup your credentials make sure that the file ~/.aws/credentials contains your AWS access id and secret key. It should look like this:


[default]
aws_access_key_id=YOUR_ACCESS_KEY_ID
aws_secret_access_key=YOUR_SECRET_ACCESS_KEY

Mounting your bucket to your local filesystem

Once your have your credentials file you are ready to mount your backup bucket. If you don’t remember the bucket name you can find it by visiting your AWS account. To mount and unmount the bucket to/from a specific location type:


# mount
s3fs BUCKET_NAME /path/to/location

# unmount
fusermount -u /path/to/location

Mounting the bucket like this is only temporary and will not persist across reboots. You can add it on /etc/fstab but I believe this only works with the passwd file. If you want to use your AWS credentials file an easy workaround it to create a shortcut in your Startup Applications Preferences.

Note that you can add a small 10 sec delay to ensure that the WiFi is connected before you try to mount the bucket. Internet access is obviously necessary for mounting it successfully. If you are behind VPNs or have other complex setups, you can also create a bash script that makes the necessary checks before you execute the mount command. Sky is the limit!

Configuring DejaDup

With the bucket mounted as a local drive, we can now easily configure DejaDup to use it. First of all we need to change the backend to local. This can be done either by using a program like dconfig or the console with the following command:

gsettings set org.gnome.DejaDup backend 'local'

Finally we open DejaDup, go to preferences and point the storage location to the directory that has your S3 backup files. Make sure you select the subdirectory that contains the backup files; this is typically a subdirectory in your mount point that has name equal to your computer’s hostname. Last but not least, make sure that the S3 mount directory is excluded from DejaDup! To do this, check the ignored folders in Preferences.

That’s it! Now go to your restore tab and DejaDup will be able to read your previous backups. You can also take new ones.

Gotchas

There are a few things to keep in mind in this setup:

  1. First of all, you must be connected on the internet when you mount the bucket. If you are not the bucket won’t be mounted. So, I advise you instead of just calling the mount command, to write a bash script that does the necessary checks before mounting (internet connection is on, firewall allows external requests etc).
  2. Taking backups like that seems slower than using the old native S3 support and it is likely to generate more network traffic (mind AWS traffic costs!). This is expected because DejaDup thinks it’s accessing the local file-system, so there is no need for aggressive caching or minimization of operations that cause network traffic.
  3. You should expect stability issues. As we said earlier, DejaDup does not know it writes data over the wire so much of the functionalities that usually exist in such setups (such as retry-on-fail) are missing. And obviously if you lose connection midway of the backup you will have to delete it and start a new one to avoid corrupting your future backups.
  4. Finally keep in mind that this is a very experimental setup and if you really want to have a reliable solution, you should do your own research and select something that meets your needs.

If you have a recommendation for an Open-Source Backup solution that allows locally encrypted incremental backups, supports S3 and has an easy to use UI please leave a comment as I’m more than happy to give it a try.

About Vasilis Vryniotis

My name is Vasilis Vryniotis. I’m a Data Scientist, a Software Engineer, author of Datumbox Machine Learning Framework and a proud geek. Learn more

Source: http://blog.datumbox.com/how-to-take-s3-backups-with-dejadup-on-ubuntu-20-10/

Continue Reading
AI1 hour ago

How does it know?! Some beginner chatbot tech for newbies.

AI1 hour ago

Who is chatbot Eliza?

AI21 hours ago

How to take S3 backups with DejaDup on Ubuntu 20.10

AI2 days ago

How banks and finance enterprises can strengthen their support with AI-powered customer service…

AI2 days ago

GBoard Introducing Voice — Smooth Texting and Typing

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

AI3 days ago

Automatically detecting personal protective equipment on persons in images using Amazon Rekognition

Trending