After benefiting for years from great resources on math, programming, data science, etc., I thought it may be a good idea to start trying to return the favor. One void I intend to contribute to is at the intersection of Data Science and DevOps.
“Data Science” is a bit of a catch-all term for all kinds of analytics, machine learning, and artificial intelligence techniques. The field includes Netflix’s movie recommendations, Amazon’s “You may also want to buy…”, Google’s autonomous vehicles, Philip’s detection of decompensating patients (e.g. CareSage), and on and on. There’s no shortage of great sites exploring the latest data science trends (many linked from the right panel, and more here).
“DevOps”, a compound term of Development and Operations, is a practice of unifying the various IT cultures around product development and deployment, streamlining practices and raising quality standards, and similarly has no shortage of content online. For decades, the people creating applications (developers), the people checking for quality (testers, quality assurance), and the people responsible for deploying, scaling, and maintaining (operations) were separate groups of people, throwing or receiving code to each other. This worked fine when software was shipped in boxes every few years, but the paradigm broke down as industry moved to faster and faster release cycles - many companies now ship new code multiple times per day! To make this possible the groups had to integrate, creating a panoply of tools and techniques to streamline systems and shorten the path from innovation to end user.
How do these worlds relate to one another? One way is in reproducible data science. The foundations of science are experimental hypothesis testing and reproducibility. Libraries and tools can play a role in the results that come out of an analysis, confounding thorough experiments and sometimes making reproducibility impossible. Open source toolsets (e.g. Python, R) have helped to create a free ecosystem of analytics environments, but libraries of modules can still impact results, and it can be tricky to truly recreate another scientist’s environment to reproduce results. Borrowing techniques from DevOps, such as version control, unit testing, containerization, and others, can help to solve these issues. For data scientists looking to deploy or scale analytics, even more can be borrowed from the DevOps side, including at data ingestion and processing, high availability, etc.
Hope you enjoy the explorations, and looking forward to comments!
About the author
Eric has a background in Electrical Engineering and Computer Science, starting from his Bachelor’s Degree from the University of Michigan, where he was electronics team leader for the U of M Solar Car Team. He attended Johns Hopkins University for graduate school, and received a PhD in Biomedical Engineering in 2011. His thesis detailed the processing that occurs in the primate visual cortex, and demonstrated that an artificial neural network optimizing for sparseness results tuning properties similar to those he observed in area V4 of the anterior visual system. Eric’s post doctoral work in retinal prosthetics at Weill Cornell Medical School was featured in a Ted Talk in 2012. In 2013 Eric joined the staff of Philips Research, where he is currently a Senior Scientist. Eric’s work involves the creation of models predicting patient deterioration in and out of the hospital, integration into clinical workflow, and the development of systems to collect, clean, and process patient data and score models in real time.