I’m on the job search and have found a portfolio of apps and analyses are a good conversation starter during interviews. What’s generally good to have in a portfolio on GitHub when it comes to Data (Analyst, Scientist, Engineer, etc.) related jobs?
For Github portfolios in general, the best thing to do is clean up your projects so titles, folders, and especially your README are all in order. This doesn’t take too long, but has a dramatic impact on when engineers take a look at your github. For your README, in particular, you want to put visual aspects (gifs and images) that show your project actually running, or a representation of the the data your project is working with.
This github, for example, has some great visualizations right there in the README, which is sometimes the only thing someone will look at in your portfolio:
If you’re looking for projects and suggestions for projects that are “conversation starters”, one of the best sources is looking at some of the data analysis fivethirtyeight has done on a number of subjects. Such as this cool data visualization of deaths in America:
See if there’s any subjects there that interest you, and you can start to branch off on projects from there (plus, most of their source code and data sets can be found for free on their github:
I agree with everything Brian said, and can also add a few more data specific resources.
I would avoid the classic datasets. MNIST, titanic and iris are good and interesting, but they are so common that they will cause the eyes of your hiring manager to glaze over. 538 and other data journalism sources are excellent sources, as are
- The Data Is Plural newsletter of unusual datasets.
- The Library of Congress has some great data resources.
- Open data is now cool, and your local government might have conversation starting data.
- The api’s of tech companies. Why not do an analysis of your favorite meetups, or who has the weirdest tweeting schedule.
If you are focused on data roles, I would also try to publish a blog post or two. These don’t have to be as involved as you might expect. Publishing on wordpress or medium is fine, and the scope of your post can be narrow. The classic advice for topics is to focus on something that would have helped you last month. Since the number of things you could talk about is huge, this will help you focus on specific deliverable things. People will be skimming your work, so visualizations are a great use of time and space. If the blog is mostly code, make sure the code is syntax highlighted, visually distinct, and placed throughout the text.
A few recent exceptional examples
- Predicting who survives in game of thrones
- A ficticious HR database, built for demonstrating tools and skills.
- A tutorial on web scraping
Making a data portfolio is an open ended project, but hopefully this helps. Remember projects that exist are infinitely better than projects that don’t, and feel free to post what you make in the forum for feedback!
I completely agree with @brian and @Jared who made very good points about data science projects. I would also add that having at least one project to talk about is not only a good conversation starter, but also a requirement if you are interviewing for a data science position. “Tell me about your project” is the question you will hear in 90% of your interviews, at least at the very first stages. You should be ready to talk about your project in simple terms and explain it to non-technical people. At the same time, be prepared to go over technical details as well, since you may be asked very specific questions about your data set, modeling techniques, etc.
Finally, when choosing a topic for your project, make sure it’s an interesting one and everyone can relate to that. My project (and research paper at the same time) was about attractiveness and how it affects wages. Every single person I interviewed with wanted to hear more details about my project, since they could easily relate to this topic. You want your interviewers to be engaged, so picking a good topic will help you a lot. Good luck and let me know if you need any additional help with your data science portfolio.