Photo of strong data science portfolio

The importance of a data science portfolio

As a data scientist, you might see the word “portfolio” and assume the article is for designers. And, while of course it portfolios are important for designers, they are also becoming a very important aspect of the data science applicants’ toolhouse too. As defined by DataCamp Chief Data Scientist David Robinson (on Mode Analytics blog), the definition of portfolio for data science is public evidence of your data science skills.

With the meteoric rise in data science jobs (344% since 2013), more and more people are coming into the market to fill these positions. With that increase, you will want to do more to establish your experience and stand out from the competition.

This is where the data science portfolio comes in. By including a portfolio, you can highlight the real-world experience that you have and show the hiring manager exactly what impact you have had in your past positions.

How do you start creating a data science portfolio?

The first piece of advice actually comes before you start job searching. Because there is so much that goes into the work that you do on a regular basis, you need to keep pretty good records of what you are doing and why you are doing it. This information will help you create your portfolio and ensure that you have everything you need to accurately explain the business case and the work you did on it.

If you have to look back at the work you did years ago and remember the minute details, you will probably miss something and if you don’t have the right information, its inclusion in your portfolio will confuse the person who is reading it. So, make sure you stay prepared by keeping track of what you are working on and even, adding it to your portfolio as you work.

In your portfolio, you should have the code you wrote in a Markdown file followed by an explanation for each line. What analysis are you making? What problem are you working to solve? Why does it matter? You want the hiring manager or recruiter to understand the case, so make it as clear as possible.

What should your data science portfolio look like?

Data science portfolios can live in a few different places, in a couple different formats. The first place is a repository on GitHub. This is a good place for your Markdown files and any other code you might have written in your past experiences.

Here is an example of a portfolio homepage on GitHub.

Once you click on one of his projects, you are brought to a separate GitHub page, which has a longer summary of the project and the code he used to work on it. It also includes graphs, data set analyses, models, and descriptions to go with everything.

Next, you should create presentations in PowerPoint or Google Slides that act as business cases for the projects that are in your portfolio. These should be easy-to-understand and coherent explanations of the problems, analyses, and solutions. You can build these on your own or use data science specific templates. Practice going through these slides so that you are comfortable presenting them in interviews because you will definitely be asked about them. Try recording and critiquing yourself or presenting for friends, so you can see where you need help to polish.

Checklist: what goes into your portfolio?

  • Brief intro, which includes who you are, what you enjoy working on, what you are looking for (a job, mentor, feedback, more projects, etc).
  • Table of contents with all of your projects and a brief summary of each to pique the reader’s interests.
  • Each project should have its own page with:
    • A larger summary of the problem, hypothesis, and solution
    • A breakdown of the code you wrote, data sets analyzed, and models built. You should include explanations with everything – assume that the reader has no background information so make sure you provide as much as you can.
    • Predictions or conclusions, depending on the project

How much should you include in your portfolio?

Similar to your resume, you want to have a few items in your portfolio, so recruiters and hiring managers realize that you have had a fair amount of experience. But, don’t sacrifice quality for quantity. If you only have 3 really good pieces, that is better than if you had 8 low or medium quality additions. In general, it is better to have 2 or more, so if you don’t feel like you have enough past experience, maybe you should work on an independent project.

Just make sure that you are presenting the entire data science workflow: discover a problem you want to solve, identify a data set, work through the data files, comb through statistics and establish hypotheses, highlight visualizations, apply any algorithms that might be relevant, and explain the metrics, and show the business case presentation.

Once you have a data science portfolio with at least 2 projects, a presentation that you are comfortable giving, and a solid resume, you should go into your data science applications and interviews with more confidence. Don’t forget that you should be continually adding to your portfolio with every new project you do, either on your own or in your new work experiences.

Pathrise is a career accelerator that works with students and young professionals 1-on-1 so they can land their dream job in tech. If you want to work with any of our advisors to get help with your data science portfolio or with any other aspect of the job search, become a Pathrise fellow.

Apply today.

Pathrise logo


Leave a Reply

Your email address will not be published. Required fields are marked *