#45 Michal Klos, Localytics and the World of Big Data

Summary
Michal Klos of Localytics tells me about their big data stack and where he thinks the industry is going.

Details
Who he is, what he does; overview of the world of big data, history, batch processing, stream processing and micro batching; databases, Apache Spark, separating storage and compute; where he thinks the industry is going in the next five years, more about Spark, data lakes, query federation, Presto; how to get started with a big data project, picking technologies, doing a test; most big data projects fail, you should start small, get cross team involvement; how to scale to petabytes, start small with short expected lifespan; technologies Localytics uses, blog, they are hiring.

 

Book Recommendations
Hadoop Application Architectures

I Heart Logs: Event Data, Stream Processing, and Data Integration

Systems Performance: Enterprise and the Cloud

Small Is Beautiful: Economics as if People Mattered

Brilliant!: Shuji Nakamura And the Revolution in Lighting Technology

Barbarians at the Gate: The Fall of RJR Nabisco

 

Download mp3 of podcast

#32 Eliot Knudsen, Tamr and a Brave New World of Data


Summary
Eliot Knudsen, field engineer at Tamr talks to me about their machine learning tool and a new way of examining data.

Details
Who he is and what he does; what is Tamr; working with data sources, the traditional way, the Tamr way, machine learning combined with human guidance;data quality and foreign languages; Thompson Reuters example, curating data, increasing speed; deploying Tamr; how Tamr works, db, java, web client; competitors; future work