a16z Podcast: A Conversation With the Inventor of Spark

The a16z Show

a16z

Culture, Business, Science, Disruption, Technology, Software Eating The World, Entrepreneurship, Innovation

4.2 • 1.2K Ratings

🗓️ 24 June 2015

⏱️ 19 minutes

🔗️ Recording | iTunes | RSS

🧾️ Download transcript

Summary

One of the most active and fastest growing open source big data cluster computing projects is Apache Spark, which was originally developed at U.C. Berkeley's AMPLab and is now used by internet giants and other companies around the world. Including, a...

Transcript

Click on a timestamp to play from that location

0:00.0	Hello, everyone. Welcome to the A6 and Z podcast. I'm Sonal and I'm here today with Matei Zaharia,
0:05.9	the CTO and co-founder of Databricks, which is the primary company driving and developing Spark.
0:11.5	And we're actually just coming out of the Spark Summit, which took place this week. And it's one of the
0:16.2	biggest events for developers who are working on Spark, for companies that are interested in Spark,
0:21.4	and pretty much for anyone who cares about trends in the big data space.
0:24.6	Just to start off, Matei, just start by just giving us a description of what Spark is.
0:29.2	So Spark is software for processing large volumes of data on a cluster.
0:33.9	And the things that make it unique are, first of all, it has a very powerful programming model
0:38.9	that lets you do many kinds of advanced analytics and processing, such as machine learning
0:44.0	or graph computation or stream processing. And second, it's designed to be very easy to use,
0:49.2	much easier to use than previous systems for working with large data. So what were some of the
0:54.1	previous systems for working with large data. So what were some of the previous systems for working with large data sets?
0:57.0	Before Spark, the most widely used system was probably MapReduce, which was invented at Google
1:03.4	and popularized to the Open Source Hadoop project.
1:07.3	And MapReduce itself was a major step over just writing distributed programs from scratch,
1:13.8	but it was still very difficult to adopt and use and led to very complicated applications
1:19.3	and also very poor performance in some of them.
1:22.1	So what were some of the reasons for inventing Spark in the first place then?
1:26.2	I mean, besides the problems and the limitations of that,
1:29.2	were you just trying to solve the problems of MapReduce, or were you actually trying to do something
1:32.9	different? Yeah, that's a good question. So we started building Spark after several years of
1:38.5	working on MapReduce and working with companies that were very early on using MapReduce.
	...

Please login to see the full transcript.

Previous episode | Next episode

Disclaimer: The podcast and artwork embedded on this page are from a16z, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of a16z and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.