#109 Joshua Sheppard, Data Science is Hard

Summary
Joshua Sheppard of Infinite Campus tells my about their data science and machine learning projects and how you can start your own.

Details
Who he is, what he does. What is data science, is a data scientist a role or a team, what skills are needed. Data vs big data. When does SQL + math become science, how to get started, Python, R and other languages; trying to follow software engineering principles when doing data science, testing, source control, etc. Azure and AWS machine learning, getting your data in to the cloud. Moving to production, scaling. Josh's data and insights into the school districts in Kentucky. Applying insights to other locations. Home baking your data science project vs leveraging the cloud platforms, it's all about access to data. Future of the field.

Links
Joshua's homepage
Joshua's Twitter

Download mp3 of podcast

#45 Michal Klos, Localytics and the World of Big Data

Summary
Michal Klos of Localytics tells me about their big data stack and where he thinks the industry is going.

Details
Who he is, what he does; overview of the world of big data, history, batch processing, stream processing and micro batching; databases, Apache Spark, separating storage and compute; where he thinks the industry is going in the next five years, more about Spark, data lakes, query federation, Presto; how to get started with a big data project, picking technologies, doing a test; most big data projects fail, you should start small, get cross team involvement; how to scale to petabytes, start small with short expected lifespan; technologies Localytics uses, blog, they are hiring.

Download mp3 of podcast

Book Recommendations
Hadoop Application Architectures

I Heart Logs: Event Data, Stream Processing, and Data Integration

Systems Performance: Enterprise and the Cloud

Small Is Beautiful: Economics as if People Mattered

Brilliant!: Shuji Nakamura And the Revolution in Lighting Technology

Barbarians at the Gate: The Fall of RJR Nabisco

#32 Eliot Knudsen, Tamr and a Brave New World of Data

Summary
Eliot Knudsen, field engineer at Tamr talks to me about their machine learning tool and a new way of examining data.

Details
Who he is and what he does; what is Tamr; working with data sources, the traditional way, the Tamr way, machine learning combined with human guidance;data quality and foreign languages; Thompson Reuters example, curating data, increasing speed; deploying Tamr; how Tamr works, db, java, web client; competitors; future work

Download mp3 of podcast