We’re excited to include guest posts on our blog from interesting people and companies in the industry. This post was written by Semih Yagcioglu, a machine learning mentor at Springboard. Springboard is an online learning platform that prepares students for the tech industry’s most in-demand careers, offering comprehensive programs in software engineering, data science, machine learning, UI/UX design, and more.
Active learning is a popular machine learning engineering technique whose goal is to extract information from a reliable source. It is commonly used to solve more practical machine learning problems.
What is active learning?
A simple way to understand active learning and its applications in machine learning is to consider its broader definition. In an educational context, active learning refers to a learning activity in which the student participates or interacts with the learning process. It’s important to note the student is actively or experientially involved in the learning process as opposed to passively taking in the information.
Similarly, in machine learning, active learning refers to any instance in machine learning in which the learner can interactively query an information source. In this case, the “student” is a machine learning algorithm, while the information source is sometimes called the “teacher” or “oracle” and can either be a human being or another intelligent system.
In both the educational and machine learning definitions of active learning, the concept specifically focuses on active engagement with the information, which means involvement is critical in an active learning setup. In this sense, the active learning concept in machine learning is actually a way to mimic humans learning actively by interactively engaging with the learning process.
Let’s see what active learning means in the context of machine learning by looking at some practical examples.
3 practical examples of active learning in machine learning engineering
There are a few different scenarios where active learning makes sense while developing a machine learning system.
Example 1: Labeling unlabeled data
The first scenario in which active learning is useful is when there is an abundance of data that is either unlabelled or too costly to label all at once. In this scenario, active learning can be used as a strategy for labeling the unlabelled data, by querying the user.
Consider an abundance of medical data in a hospital database. Let’s say someone wants to develop a machine learning system to predict COVID-19 by highlighting affected regions in chest scans. Even though the hospital might have plenty of chest scans, it would be too costly for the hospital to label all the data in the database.
Should the hospital wait and label all the data in its database or build a system as soon as possible and iteratively improve its results?
Here, the challenge is to choose which of the samples should be labeled so that the machine learning system can iteratively perform better with each labeled data instance. Determining which samples to ask the user to label is where the active learning approach comes into play. Instead of choosing samples randomly, the most informative samples based on the metric required should be chosen. For instance, if the system is not performing well for patients who are over 65 and heavy smokers, then prioritizing these patients might provide more value than trying to label patients over the age of 20.
Example 2: Stream-based pooling.
Let’s say someone wants to develop a machine learning system for an e-commerce platform. They are receiving several orders from users but they don’t want to bug customers by asking a question every time the customer places an order. Instead, they need to decide which order should be chosen to improve the quality of the machine learning system in place.
Here, again we should be determining on-the-fly whether a data instance should be selected for labeling by looking at how informative that particular data point is based on a predefined metric. This approach is known in active learning research as the stream-based pooling method.
Example 3: Membership query synthesis
The third scenario commonly applies to generative models in which the machine learning system generates or constructs its own instances from an underlying natural distribution and known as membership query synthesis. This is particularly useful for synthetic data generation setups.
Consider a machine learning system whose goal is to generate realistic-looking fake human models so that someone can create on-the-fly instructional videos without the need for a real human cast. Your machine learning system should be able to generate fake human models until they are not distinguishable from the real human models. In this case, the challenge is that you need a lot of supervisory signal from the teacher, and with active learning, you can learn the underlying distribution of natural human videos by frequently asking the teacher if the generated instance is a real human or fake human and using this label to incrementally improve the system.
Want to learn more?
If you’re interested in learning more about machine learning engineering, check out Springboard’s Machine Learning Engineering bootcamp. You’ll design a machine learning/deep learning system, build a prototype, and deploy a running application that can be accessed via API or web service. (No other bootcamp does this!)
Springboard offers online courses and bootcamps in UI/UX design, data science, data analytics, software engineering, and machine learning engineering. All courses include 1-on-1 mentorship and Springboard’s one-of-a-kind job guarantee: students have a six-month runway to secure a role in their industry or get 100% of their tuition back.
Pathrise is a full service organization that helps people land their dream job in tech. We work extensively with software engineers by providing technical workshops, 1-on-1 mentoring sessions, and pair programming sessions. In addition, we offer guidance on other components of the job search, including resume and portfolio optimization, LinkedIn optimization, behavioral interview preparation, reverse recruiting strategies, salary negotiation, and more.