Data science is a new and growing profession and many companies are scrambling to either find a data scientist or to increase their data team. But what exactly do these people do and how can they help all types of organizations?
At Super Forum 2017, we’re excited to bring Brad Klingenberg, the Vice President of Data Science at Silicon Valley startup Stitch Fix, as a keynote speaker. He’s going to talk about what data scientists can do for organizations and how they wrangle big data into actionable insights.
Here’s a brief introduction to Brad, Stitch Fix and data science.
Higher Logic: First things first—what is a data scientist?
Brad Klingenberg: I’ve always been fond on the definition from Josh Wills, a data scientist is “a person who is better at statistics than any software engineer and better at software engineering than any statistician.”
I think this is helpful in underscoring the unique position data science has at the intersection of so many disciplines. It’s a big tent and includes much of statistics, computer science, and machine learning among other related fields depending on application (e.g. natural language processing). In practice, data science includes a wide range of activities including experimentation, statistical modeling, machine learning, designing and implementing algorithms, some data engineering and, in general, making decisions with data. Because of its’ breadth, it’s hard to find something much pithier than the definition from Josh.
HL: What are the qualities of a good data scientist?
BK: Data science requires a strong technical skillset, but that’s often not what separates a great data scientist from a good data scientist. Beyond technical skills, two important qualities are: a maturity in framing problems and building models, as well as comfort with ambiguity.
The best data scientists are often the ones who know when a simple solution is the right one and don’t overcomplicate their approaches - a preference that usually comes with experience working with data and models. Being comfortable with ambiguity is similarly important. Many people can excel when a clearly framed prediction problem is presented to them, but the best are those who can figure out what problems to solve in the first place. Defining the goal of a project, how to measure its success, and what exactly you are optimizing is often trickier and more important than commonly appreciated.
HL: What’s the biggest misconception of data science?
BK: Given the enormous success and popularity of deep learning in recent years, one might think that this is the best approach for any problem. Sometimes it is, but many problems are better approached with other tools - particularly when the limiting factor is not the complexity of the predictive model.
There is also sometimes a misconception that data science is all about supervised learning, that is, making predictions. Supervised learning is important, but data scientists are also engaged in a wide variety of activities including experiments and inference, forecasting, measuring tradeoffs, optimization, and in many cases, figuring out what data to collect in the first place.
HL: How does someone end up as a data scientist? And what was your journey to becoming one?
BK: If data science were a river it would have many tributaries. I came to data science through applied statistics but many of my colleagues come from computer science, engineering and the quantitative sciences: physics, neuroscience, epidemiology etc. The best preparation is to have spent a lot of time manipulating data with code and trying to answer hard questions with data.
HL: What does your team do for Stitch Fix? And how does data science help you achieve your goals?
BK: One of the most interesting aspects of Stitch Fix is the degree to which data science is engaged with nearly every aspect of the business. As is well known, we use data science to help pick the best inventory to send to our clients. But we also use data in a wide variety of other ways including (but not limited to!) helping decide what inventory to buy, how to manage our inventory, forecasting client demand, personalizing our marketing, optimizing the operations of our warehouses, and helping decide what data to collect from clients.
You can find a fun overview of the ways that we use data science at: http://algorithms-tour.stitchfix.com/.
While there are many aspects of Stitch Fix that lend themselves to using data and personalization, I think the broad engagement of data science is a trend that we will start to see much more broadly in many industries and organizations - including nonprofits, B2B companies, and professional associations.
HL: What’s the most interesting issue you’ve solved with data?
BK: One of the most fun problems we’ve been working on is using data and algorithms to design new clothes. While we’ve always used our feedback loops from client checkout data to iteratively improve our inventory, we have only recently begun to truly design new items using algorithms.
Even in its early stages, this new approach has produced some extremely successful new styles. I certainly never thought I’d be designing apparel! You can find some fun coverage of the topic in the WSJ: https://www.wsj.com/articles/next-top-fashion-designer-a-computer-1489323600.