113 data science interview questions to nail your onsite Interview

Get all of the practice you need for your upcoming data science onsite interview so that you can turn that interview into an offer. We have helped 2,500+ people land great jobs in tech, so we wanted to share some of our inside information and data-backed tips to help you, too.

Check out our list of 113 data science interview questions from top tech companies so you can practice and go into your sessions with confidence.

Behavioral
Whiteboarding
Statistics
Probability
Case study questions
SQL & databases
Programming
Modeling

Data Science Interview Questions

Behavioral Data Science Interview Questions

About half of your data science interview questions will be technical–the other half will be common behavioral questions. While data science interviews include far more technical questions than most roles in tech, you’ll still have to show culture fit and soft skills to get hired. Even at a final onsite interview, you’ll still get plenty of behavioral questions. The only difference between data science interview questions and typical behavioral interview questions is that data science interview questions will usually be framed in terms of past data science experiences and hypothetical data science scenarios. Questions may also require knowing at least some background information of the company, such as the company’s products and mission. Prepare details from past data projects to answer using the STAR method. You should also review the company’s mission, values, and key facts.

Airbnb question: Tell me about your favorite data science project.
Amazon question: Tell me about a time you had to communicate a data-driven insights to a non-technical executive?
Google question: Tell me about a time when you had to clean and organize a large dataset.
Amazon question: What would you do if you were assigned a project with a technology you’re not familiar with?
These 15 most common behavioral interview questions

Whiteboarding Data Science Interview Questions

In a “whiteboard” interview, you’ll be asked to work through a technical challenge (often a case study, SQL, or programming question) in front of your interviewer and present your answer. Sometimes, you’ll present on an actual whiteboard. But whiteboarding interviews can be virtual, too. Whiteboarding interviews are less common in data science interviews than software engineering interviews. But if you’re regularly applying and interviewing for data science roles, you’ll almost certainly encounter plenty of whiteboarding interviews.

You’ll usually get about an hour to solve the problem (most commonly a data science case study). When you’re finished, you’ll discuss your problem-solving process and any solutions you’ve come up with. While getting the right answer matters, the interviewer wants to see how you think. Whiteboarding will assess your technical skills, but also your communication, statistical understanding, and general problem solving skills. Even if you make a mistake, you’ll earn points for clearly communicating your reasoning to the interviewer.

Any of the following technical data science interview questions in this list could be used as whiteboarding questions. Case studies and programming questions are especially common. In addition to practicing those data science interview questions, you can use our guide to whiteboarding interviews to help prepare.

Statistics Data Science Interview Questions

Accenture question – What is linear regression?
Google question – Find the width of the confidence interval
What is your familiarity with statistical methods and passed projects?
Airbnb question – How can you report the statistical results to a non-statistician staff?
Netflix question – When you split a population for A/B testing, what are some reasons you could see a significant difference in the control and variant groups?
Apple question – What is bias variance tradeoff? How is XGBoost handling bias-variance tradeoff?
Explain a probability distribution that is not normal.
Google question – If two predictors are highly correlated, what is the effect on the coefficients in the logistic regression? What are the confidence intervals of the coefficients?
When using Gaussian mixture model, how do you know it is applicable?
IBM question – What are the relationships between the coefficient in the logistic regression and the odds ratio?
What are different metrics to classify a dataset?
How do you find an anomaly in a distribution? How do you investigate that a certain trend in a distribution is due to anomaly?
Explain the concept of multicollinearity

Probability Data Science Interview Questions

Facebook question – If you draw 2 cards from a shuffled 52 card deck, what is the probability that you’ll have a pair?
Given an unfair coin with the probability of heads not equal to .5, what algorithm could you use to create a list of random 1s and 0s?
Groupon question – You are on a number line and you can jump to one of the neighboring points with equal probability, with the exception of n=0 where you can’t go to negative numbers but have to come back to n=1. If you start at n=44, what is the expected number of steps to reach n=4444?
CA Technologies question – How do you get an estimate of the answer using Taylor expansion?
Microsoft question – Generate 7 integers with equal probability from a function which returns 1/0 with probability p and (1-p).

Case Study Data Science Interview Questions

How would you measure the impact of introducing a new tool for partners?
CA Technologies question – How do you design an algorithm for fraud detection?
Twitter question – What features would you use to build recommendation algorithm for users.
Accenture question – Map an organization’s problem to data science – how will you solve it using data science and machine learning?
Uber question – How much would it cost (initial and sustaining costs) to having a fleet of vehicles take Google street view photos of every major city in the US every day?
Airbnb question – Brainstorm potential causes of an anomaly in web traffic data.
An important metric goes down, how would you dig into the causes?
Amazon question – Estimate the cumulative sum of the top 10 most profitable products of the last 6 month for customers in Seattle.
How do you deal with unbalanced data where the ratio of positive and negative is huge?
Booking question – How can we automatically propose ‘good value deals’ to customers, including hotels that don’t have a rating yet?
If you have a customer and want to decide whether they will “buy today” or “not buy today” and you know 1. where they live, 2. their income, 3. their gender, 4. their profession, how would you define a machine learning algorithm to figure this out?
LinkedIn question – Come up with some of the factors that could be used to produce certain algorithms (‘people you may know,’ and an algorithm to discover when a person is starting to search for new job).

Final case study questions

Slack question – How would you prioritize which country to expand Slack to for furthering the international effort?
LinkedIn question – What product metrics do you construct? How do you tell if your experiment is successful?
Stripe question – How would you choose between the subscription and the market-place based options i.e. evaluate which would be better for the business in the long run?
Booking question – How would you tag a listing as value for money? How would you measure the “value”? What features would you select to explain the “value”?
Intuit question – How would you design a ranking system?
Apple question – How do you take millions of users with 100s of transactions each, amongst 10ks of products and group the users together in a meaningful segments?
Facebook question – How many high schools that people have listed on their profiles are real? How do we find out, and deploy at scale, a way of finding invalid schools?
Uber question – If you were rolling out Uber ride passes for the first time, how would you set the prices?
We have a product that is getting used differently by two different groups. What is your hypothesis about why and how would you go about testing it?
Uber question – Explain how network effects might influence your choice of how to assign experimental/control units and measure your main outcome metrics
What trends in the data indicate that a given market is healthy? What does price tell you?

SQL and Databases Data Science Interview Questions

Twitter question – How can you illustrate a tree-based system with a SQL query?
Dell question – What is indexing in database?
Pinterest question – Write a SQL query to count the number of unique users per day who logged in from both an iPhone and the web, where iPhone logs and web logs are in distinct relations.
Spotify question – Given a sample set of tables, write a sql query to get a summary metric from those tables.
Facebook question – Given a series of tables; write the SQL code you would need to count subpopulations through joins.
If you have a table with a billion rows, how would you add a column inserting data from the original source without affecting the user experience?
Facebook question – There is a table that tracks every time a user turns a feature on or off, with columns for user_id, action (“on” or “off), date, and time. How many users turned the feature on today? How many users have ever turned the feature on? In a table that tracks the status of every user every day, how would you add today’s data to it?

Programming Data Science Interview Questions

Check if an integer is a palindrome (do not convert the integer to string)
Adobe question – What kind of coding language do you use when handling a large-scale dataset?
How would you impute missing information?
Amazon question – Write a Python function that displays the first n Fibonacci numbers.
Write Python code to return the count of words in a string
Cisco question – Merge 2 sorted linked list
Rakuten question – Write a function that finds the MST of a directed graph.
Clone a graph
eBay question – Given a function roll() that uniformly returns a double between 0 and 1 and an array/list of numbers of length N (no duplicates), create a function shuffle() that returns a permutation of equal probability.
Given 2 sorted arrays of integers, code to find a number from each array such that their sum is closest to some integer K
How would you create/design/implement a certain algorithm from start to end?
LinkedIn question – Given a random generator that produces a number 1 to 5 uniformly, write a function that produces a number from 1 to 7 uniformly
Generate a sorted vector from two sorted vectors.
Uber question – Given a random Bernoulli trial generator, write a function to return a value sampled from a normal distribution.

Modeling Data Science Interview Questions

What is the degree of freedom for lasso?
Airbnb question – Does the practice of removing missing values cause bias? If so, what would you do?
What is cross validation?
Amazon question – What types of regularization exist? Which one is simpler to use?
What is a time series model and how do you do the calculation of ACF and PACF?
Booking question – How would you create an attribution model?
What would you do if the relation between outcome and features is not linear? How do you validate the model you built? Design and describe an experiment to confirm that the method you developed is a good one.
Dell question – What is dimensionality reduction?
IBM question – How do you validate a machine learning model?
What is a propensity model and how are beta estimates calculated by MLE?
Dropbox question – How would you set up a propensity model for the SMB team looking at companies between 5-200 employees?
Adobe question – What is the difference between logit and probit models?
eBay question – Suggest a modeling process for a binary classification task with skewed and unbalanced data.
Build a model to identify customers interested in receiving ad emails.
Google question – If the labels are known in the clustering project, how do you evaluate the performance of the model?
How do you evaluate the performance of a regression prediction model as opposed to a classification prediction model?
Microsoft question – How would you explain a deep learning model to customers?
FICO question – What is a distribution you may use to model data whose range of input values is [0, N]?
How do you measure and compare models? For example, the pros and cons of Random Forest vs. Logistic Regression?

Always start with clarifying questions

Sometimes, interviewers make a question intentionally vague as a way to test your problem solving skills. Especially for case study questions, it’s important to clearly define the business use case and metric. For example, if a company asked you to investigate “why sign up rates have declined,” you can ask questions such as:

1) “Over what time period did the decline happen and during which months?”

2) “How are we defining sign up rate? What is the numerator and denominator?”

Proactively show positive signal

While you’re working, provide 30 second “tidbits” of knowledge proactively. This is a strong tactic because, not only does it reduce the opportunities for negative signal, but also it provides the interviewer with a sense of your knowledge. Just make sure you are confident in what you are mentioning so it doesn’t come back to bite you.

Make context statements

Context statements are the difference between doing something and providing the reasoning before/as you are doing something. Adding context can help interviewers interpret your work better. So, try to provide the rationale behind your actions so that your interviewer knows why you are making the choices you are making, especially for actions where the interpretation is opinionated.

Know how to get help

AKA – getting a hint. Some interviewers really hate the word, “hint,” so a better approach is to say something like, “my assumptions are X and Y, I’m thinking of doing Z. But I’m struggling with solving [specific problem].” You can also ask collaborative questions like,

I was wondering if you had any thoughts.
Do you think I’m going down the right direction?
Do you think my assumptions are incorrect?

Understand when to ask permission questions

Every interviewer will have different preferences. For key decision points where the interviewer will have a different preference, you should ask for permission before assuming an appropriate action. These can be questions like, “Can I Google the syntax online?” or “Is it okay if I write some thoughts down on paper?” It’s also better if you ask more closed questions such as, “should I use this solution or think of something more optimal?” versus “What should I do next?”

With these questions & tips in your back pocket, you should be more than prepared for your next data science technical onsite interview. For more help with your data science job search, check out our guide to landing a data science job.

You can also review our other interview question lists:

Pathrise is a career accelerator that works with students and professionals 1-on-1 so they can land their dream job in tech. With these tips and guidance, fellows have seen their interview performance scores double.

If you want to work with any of our mentors 1-on-1 to get help with your data science interviews or with any other aspect of the job search, become a Pathrise fellow.

Apply today.