I agree with everything Brian said, and can also add a few more data specific resources.
I would avoid the classic datasets. MNIST, titanic and iris are good and interesting, but they are so common that they will cause the eyes of your hiring manager to glaze over. 538 and other data journalism sources are excellent sources, as are
- The Data Is Plural newsletter of unusual datasets.
The Library of Congress has some great data resources.
- Open data is now cool, and your local government might have conversation starting data.
- The api’s of tech companies. Why not do an analysis of your favorite meetups, or who has the weirdest tweeting schedule.
If you are focused on data roles, I would also try to publish a blog post or two. These don’t have to be as involved as you might expect. Publishing on wordpress or medium is fine, and the scope of your post can be narrow. The classic advice for topics is to focus on something that would have helped you last month. Since the number of things you could talk about is huge, this will help you focus on specific deliverable things. People will be skimming your work, so visualizations are a great use of time and space. If the blog is mostly code, make sure the code is syntax highlighted, visually distinct, and placed throughout the text.
A few recent exceptional examples
Making a data portfolio is an open ended project, but hopefully this helps. Remember projects that exist are infinitely better than projects that don’t, and feel free to post what you make in the forum for feedback!