Federal Science DataHubFederal Science DataHub
  • English
  • Français
  • English
  • Français
  • Overview
  • Managing Workspaces and Users

    • Getting a workspace (only available on the GC network)
    • Estimate costs (only available on the GC network)
    • Account Setup
    • Requesting, configuring and removing tools in your workspace
    • Invite a user
    • Change a user role
    • Manage your CBR & workspace budgets
  • Storage

    • Working with Azure Storage
    • Bring Your Own Storage

      • Import AWS S3 Bucket
      • Import Azure Storage
      • Import Google Cloud Platform Storage
    • Access Storage in Databricks
    • Use AzCopy to Interact with Storage
  • Databricks

    • Getting Started with Databricks
    • FSDH Cluster Policies
    • MLFlow: AutoML and Experiments
    • Databricks Workflows
    • Dashboarding

      • How to Dashboard in Databricks
      • Dashboarding Tool Comparison
    • External Extensions

      • Git/GitHub Integration with Databricks
      • Databricks VS Code Extension
      • Working with Conda
      • Connecting Google API to Databricks
  • PostgreSQL

    • Create and use a PostgreSQL Database
    • Add a User to PostgreSQL on FSDH
    • PostgreSQL vs Azure Databricks Database Features
  • Web Applications

    • Hosting Web Apps on DataHub
  • Migrating to Production

    • Migrating Storage
    • Migrating Databricks
    • Migrating PostgreSQL
    • Migrating Web Apps
  • User Guidance

    • Account Management and Access control of workspaces
    • Backup and Recovery
    • Github and code repo management
    • Incident Detection & Response
    • Monitor Usage
    • Monitoring and Auditing a Workspace
    • Source code
    • Restricted File Types on FSDH Storage
  • Terms and Conditions

Access Storage in Databricks

When your Databricks workspace is created for your project, an Azure Storage Account has already been created. DataHub also mounts the Blob storage of the Storage Account with the pre-created Databricks cluster (main_cluster). This mount is for your convenience and access to the Blob data is subject to your identity.

Prerequisites

  • Familiarity with file API in Python or R
  • Access to Databricks in a workspace

Default DataHub Mount point

The storage account has been mounted in Databricks to the default cluster (main_cluster) and can be accessed in your Noteobok just like a regular folder. Mounting storage in Databricks allows you to access objects in object storage as if they were on the local file system.

To access the mount point in the default cluster, consider the sample code below

df = spark.read.option("header","true").csv('/mnt/fsdh-dbk-main-mount/sample.csv');
df.show(3);

In the above example, the pre-created path /mnt/fsdh-dbk-main-mount/ points to the datahub container of your Blob storage. The file sample.csv is for illustration purpose and you must change to your file name.

To access the same pre-created path from R using SparkR, refer to the following sample code.


library(SparkR)
sparkR.session()
df <- read.df("dbfs:/mnt/fsdh-dbk-main-mount/sample.csv", source = "csv")
head(df, 3)

Other Approaches

As you create more clusters based on DataHub cluster policies, you can mount your project Blob storage in your code.

Option 1 - SAS Token (Recommended)

In your notebook, simply refer to your storage using the preconfigured Spark configuration abfss_uri. Sample code:

dbutils.fs.ls(spark.conf.get('abfss_uri'))

How does this work?

The SAS token for your storage account has been precreated in Azure Key Vault and referenced in your cluster configuration. The SAS token will be rotated periodically. The cluster configuration looks like the following. These settings apply to clusters created using FSDH cluster policy as well as personal clusters.

fs.azure.sas.token.provider.type.yourstorageaccount.dfs.core.windows.net org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider
fs.azure.account.auth.type.yourstorageaccount.dfs.core.windows.net SAS
fs.azure.sas.fixed.token.yourstorageaccount.dfs.core.windows.net {{secrets/datahub/container-sas}}
abfss_uri abfss://datahub@yourstorageaccount.dfs.core.windows.net

Option 2 - Mount with Storage Account Key

if any(mount.mountPoint == "/mnt/fsdh" for mount in dbutils.fs.mounts()):
         dbutils.fs.unmount("/mnt/fsdh")

dbutils.fs.mount(
   source = "wasbs://datahub@mystorage.blob.core.windows.net",
   mount_point = "/mnt/fsdh",
   extra_configs = {"fs.azure.account.key.mystorage.blob.core.windows.net":dbutils.secrets.get(scope = "datahub", key = "storage-key")})

Option 3 - Mount Blob Container

Mount the container with the following code

configs = {
    "fs.azure.account.auth.type": "CustomAccessToken",
    "fs.azure.account.custom.token.provider.class": spark.conf.get("spark.databricks.passthrough.adls.gen2.tokenProviderClassName")
}

dbutils.fs.mount(
    source = "abfss://container@account.dfs.core.windows.net/",
    mount_point = "/mnt/my-mountpoint",
    extra_configs = configs
)

Once you have mounted a folder in /mnt/my-mountpoint, the python programs in Databricks can access files in that storage container as if they were local files.

df = spark.read.option("header","true").csv('/mnt/my-mountpoint/sample.csv')
df.show(3)

Option 4 - Directly Access Individual Files

You can also directly access files without mounting the storage first

spark.read.format("csv").load("abfss://container@account.dfs.core.windows.net/sample.csv").collect()

References

See Databricks Storage Documentation for more details

Edit this page on GitHub
Last Updated: 2026-04-13, 11:39 a.m.
Next
Use AzCopy to Interact with Storage