2023-12-04    Share on: Twitter | Facebook | HackerNews | Reddit

Databricks - key concepts

mindmap
Databricks
    Databricks Workspace
    Databricks Runtime
    Databricks File System (DBFS)
    Databricks Clusters
    Databricks Notebooks
    Databricks Jobs
    Databricks Tables

Here are some of the key features and components of Databricks:

Databricks Workspace

This is the collaborative environment where you can write code, create visualizations, and share your work with others. It supports several languages including Python, SQL, R, and Scala. Read more: Create and manage your Databricks workspaces | Databricks on AWS

Databricks Runtime

This is the set of core components that run on the clusters in Databricks. It includes Apache Spark but also includes other enhancements maintained by Databricks like performance optimizations, security, and integration with other tools like Delta Lake and MLflow. Read more: What is Databricks Runtime?

Databricks File System (DBFS)

This is a distributed file system installed on Databricks clusters. It allows you to store data and share it across all nodes in a cluster. Read more: What is the Databricks File System (DBFS)?

Databricks Clusters

These are the compute resources that run your code. You can create clusters of different sizes and types depending on your workload. Read more: Compute - Azure Databricks

Databricks Notebooks

These are collaborative documents that contain code, visualizations, and text. They're great for exploratory data analysis, data science, and machine learning workflows. Read more: Introduction to Databricks notebooks

Databricks Jobs

These are the tasks or computations you run on Databricks. You can schedule jobs to run periodically, or run them on demand. Read more: Create and run Databricks Jobs

Databricks Tables

These are the structured data sources that you can query using SQL or data frame APIs in Python, R, and Scala. Read more: Delta Live Tables