Github Data Science Portfolio
I’m beginning to post my data science portfolio on GitHub. Here’s the link:
Data Science Project Portfolio
I have a long list of projects accumulated over the years, and it’s time to make them more visible to the world. As both a creative and a data professional, working with data can be frustrating because it’s often hard to publish your work. Most of the time the data is proprietary, and companies are generally reluctant for their employees and consultants to publish either the data or the code used to analyze it (especially if new models and methods have been developed). So in many cases I’m having to rework the project with different, public data, or with different code that will not reveal any trade secrets and violate NDAs.
So far there’s a complex dynamic system (Lorenz Attractor) demonstration; a demonstration of the dramatic performance improvements to be had from using the numpy built-in array type; a project that calculates and visualizes the twenty busiest airports from FAA flight and route data; one that tackles spam using a Gaussian Mixture model; and one that analyzes credit card fraud using a public dataset.
This portfolio is definitely a work in progress, but it’s a labor of love. Data science, for all its challenges, is exciting and rewarding, so I’m glad to be sharing some of this work with wider audience. If you’re interested, star the repo to receive updates as I continue to publish projects there. As always, let me know if you have questions or suggestions, or just want to talk data.