meta_pixel
Tapesearch Logo
Log in
Machine Learning Guide

MLA 021 Databricks: Cloud Analytics and MLOps

Machine Learning Guide

OCDevel

Artificial, Introduction, Learning, Courses, Technology, Ml, Intelligence, Ai, Machine, Education

4.9 • 848 Ratings

🗓️ 22 June 2022

⏱️ 26 minutes

🧾️ Download transcript

Summary

Databricks is a cloud-based platform for data analytics and machine learning operations, integrating features such as a hosted Spark cluster, Python notebook execution, Delta Lake for data management, and seamless IDE connectivity. Raybeam utilizes Databricks and other ML Ops tools according to client infrastructure, scaling needs, and project goals, favoring Databricks for its balanced feature set, ease of use, and support for both startups and enterprises.

Links

Raybeam and Databricks

  • Raybeam is a data science and analytics company, recently acquired by Dept Agency.
  • While Raybeam focuses on data analytics, its acquisition has expanded its expertise into ML Ops and AI.
  • The company recommends tools based on client requirements, frequently utilizing Databricks for its comprehensive nature.

Understanding Databricks

  • Databricks is not merely an analytics platform; it is a competitor in the ML Ops space alongside tools like SageMaker and Kubeflow.
  • It provides interactive notebooks, Python code execution, and runs on a hosted Apache Spark cluster.
  • Databricks includes Delta Lake, which acts as a storage and data management layer.

Choosing the Right MLOps Tool

  • Raybeam evaluates each client’s needs, existing expertise, and infrastructure before recommending a platform.
  • Databricks, SageMaker, Kubeflow, and Snowflake are common alternatives, with the final selection dependent on current pipelines and operational challenges.
  • Maintaining existing workflows is prioritized unless scalability or feature limitations necessitate migration.

Databricks Features

  • Databricks is accessible via a web interface similar to Jupyter Hub and can be integrated with local IDEs (e.g., VS Code, PyCharm) using Databricks Connect.
  • Notebooks on Databricks can be version-controlled with Git repositories, enhancing collaboration and preventing data loss.
  • The platform supports configuration of computing resources to match model size and complexity.
  • Databricks clusters are hosted on AWS, Azure, or GCP, with users selecting the underlying cloud provider at sign-up.

Parquet and Delta Lake

  • Parquet files store data in a columnar format, which improves efficiency for aggregation and analytics tasks.
  • Delta Lake provides transactional operations on top of Parquet files by maintaining a version history, enabling row edits and deletions.
  • This approach offers a database-like experience for handling large datasets, simplifying both analytics and machine learning workflows.

Pricing and Usage

  • Pricing for Databricks depends on the chosen cloud provider (AWS, Azure, or GCP) with an additional fee for Databricks’ services.
  • The added cost is described as relatively small, and the platform is accessible to both individual developers and large enterprises.
  • Databricks is recommended for newcomers to data science and ML for its breadth of features and straightforward setup.

Databricks, MLflow, and Other Integrations

  • Databricks provides a hosted MLflow solution, offering experiment tracking and model management.
  • The platform can access data stored in services like S3, Snowflake, and other cloud provider storage options.
  • Integration with tools such as PyArrow is supported, facilitating efficient data access and manipulation.

Example Use Cases and Decision Process

  • Migration to Databricks is recommended when a client’s existing infrastructure (e.g., on-premises Spark clusters) cannot scale effectively.
  • The selection process involves an in-depth exploration of a client’s operational challenges and goals.
  • Databricks is chosen for clients lacking feature-specific needs but requiring a unified data analytics and ML platform.

Personal Projects by Ming Chang

  • Ming Chang has explored automated stock trading using APIs such as Alpaca, focusing on downloading and analyzing market data.
  • He has also developed drone-related projects with Raspberry Pi, emphasizing real-world applications of programming and physical computing.

Additional Resources

Transcript

Click on a timestamp to play from that location

0:00.0

Welcome back to Machine Learning Applied.

0:02.0

In this episode, I'm interviewing Ming Chang from Ray Beam.

0:06.0

Raybeam is Dept Agency's latest acquisition.

0:09.0

Remember I mentioned earlier that Dept Agency is the parent organization.

0:13.0

They're out of Amsterdam, and they've been acquiring various companies.

0:16.0

I come to them through Rocket Insights, one of their acquisitions. And Ray Beam is their latest acquisition.

0:22.1

Ray Beam's primary focus is data science, data analytics,

0:25.9

whereas Rocket Insights is actually primarily app development.

0:29.7

So it's really good to have Ray Beam on the team

0:32.8

because I'll be able to pick their brains about various topics

0:36.0

deep in the data science space.

0:38.0

In this particular episode, we're talking about data bricks.

0:41.6

Now, going into this episode, I actually thought data bricks was something of an analytics platform,

0:46.4

like a desktop application, similar to Tableau.

0:49.9

But it turns out, as you'll see in this episode, I was wrong about that.

0:52.5

Data Bricks is also an MLOPS platform, which actually makes it a competitor now to SageMaker and Cubeflow and some of the other tools we've talked about.

1:02.5

So I'm probably going to take a step back from MLOps after this episode.

1:07.8

So we can talk about other things in data science and machine learning.

1:10.5

We've kind of beaten the dead horse on MLOPS.

1:12.7

But I do know that Databricks is a product that kept coming up over and over in my exposure

1:18.7

in data science.

1:19.6

And so I thought it would be worth deep diving since it tends to be a favorite tool within

...

Please login to see the full transcript.

Disclaimer: The podcast and artwork embedded on this page are from OCDevel, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of OCDevel and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.

Copyright © Tapesearch 2025.