Databricks VS Code Extension

Using the Databricks VS Code extension, you can connect to a Databricks workspace from within VS Code. This allows you to:

  • Write your code locally in VS Code, and then run it remotely on a Databricks cluster.
  • Run SQL queries on a Databricks cluster and see the results directly in VS Code.
  • Manage your Databricks clusters.

Why use it?

Visual Studio Code is an extremely popular code editor. As an open source system, it has a large community of contributors and users. It is also highly extensible, allowing users to install a wide variety of extensions to support for different programming languages, debugging, and more.

Prerequisites

Install the extension

  1. Open Visual Studio Code.
  2. Click the Extensions icon in the left navigation bar.
  3. Search for Databricks.
  4. Click Install. The correct extension is shown in the screenshot below. Databricks extension

Connect to a Databricks workspace

  1. Click the Databricks icon in the left navigation bar.
  2. Click Configure.
  3. Enter the URL of your Databricks workspace in the space shown below, up until .net/. For example, if your workspace URL is https://sample.azuredatabricks.net/?o=111111111111#, enter https://sample.azuredatabricks.net/. Databricks extension
  4. On the next screen, select Edit Databricks profiles
  5. On the screen that opens, complete the following and save the file:
[DEFAULT]
host = https://sample.azuredatabricks.net/
token = your_token
jobs-api-version = 2.1

After completing these steps, click Configure again and access your saved [DEFAULT] profile. Databricks will connect automatically.

Run Local Code

NOTE: You must open a folder to use this part of the extension. To do so, click File > Open Folder and select the folder where you keep your code.

Attach a Cluster

Before running code, you must attach a cluster.

  1. Open the Databricks icon in the left navigation bar.
  2. If no cluster is attached, hover over the Cluster bar and click Configure Cluster, as shown in the screenshot below. Databricks extension
  3. From the dropdown, select the cluster you want to attach, as shown in the screenshot below. Databricks extension
  4. You can now start the cluster from the extension.

NOTE: You cannot create a cluster in the extension. You must create it in Databricks itself.

Writing Your Code

Any .py file works, but you can format them to leverage the notebook functionality of Databricks.

To use a notebook, put # Databricks notebook source at the top of your .py file. This tells Databricks to treat the file as a notebook. You can then use the following commands to control the notebook:

  • # COMMAND ---------- creates a new cell.
  • # MAGIC %md creates a markdown cell.
  • # MAGIC %sql creates a SQL cell.
  • # MAGIC %scala creates a Scala cell.
  • # MAGIC %r creates an R cell.
  • # MAGIC %python creates a Python cell.

NOTE: You should ensure that any libraries you import are installed on the cluster. You can do this by including the pip install command in your code or by installing the libraries on the cluster itself.

Run Local Code on a Cluster

  1. Open the Explorer menu in the left navigation bar.
  2. Navigate to the file you want to run. You can use any time of file that you can run in Databricks (R, Python, etc).
  3. Ensure the cluster is started, then right-click the file and select Run File as Workflow in Databricks, as shown in the screenshot below. Databricks extension
  4. The file will run on the cluster. You can see the results in the Output window. Databricks extension

NOTE: Your code will get copied to Databricks under the .ide folder in your workspace.