Federal Science DataHubFederal Science DataHub
  • English
  • Français
  • English
  • Français
  • Overview
  • Managing Workspaces and Users

    • Getting a workspace (only available on the GC network)
    • Estimate costs (only available on the GC network)
    • Account Setup
    • Requesting, configuring and removing tools in your workspace
    • Invite a user
    • Change a user role
    • Manage your CBR & workspace budgets
  • Storage

    • Working with Azure Storage
    • Bring Your Own Storage

      • Import AWS S3 Bucket
      • Import Azure Storage
      • Import Google Cloud Platform Storage
    • Access Storage in Databricks
    • Use AzCopy to Interact with Storage
  • Databricks

    • Getting Started with Databricks
    • FSDH Cluster Policies
    • MLFlow: AutoML and Experiments
    • Databricks Workflows
    • Dashboarding

      • How to Dashboard in Databricks
      • Dashboarding Tool Comparison
    • External Extensions

      • Git/GitHub Integration with Databricks
      • Databricks VS Code Extension
      • Working with Conda
      • Connecting Google API to Databricks
  • PostgreSQL

    • Create and use a PostgreSQL Database
    • Add a User to PostgreSQL on FSDH
    • PostgreSQL vs Azure Databricks Database Features
  • Web Applications

    • Hosting Web Apps on DataHub
  • Migrating to Production

    • Migrating Storage
    • Migrating Databricks
    • Migrating PostgreSQL
    • Migrating Web Apps
  • User Guidance

    • Account Management and Access control of workspaces
    • Backup and Recovery
    • Github and code repo management
    • Incident Detection & Response
    • Monitor Usage
    • Monitoring and Auditing a Workspace
    • Source code
    • Restricted File Types on FSDH Storage
  • Terms and Conditions

PostgreSQL vs Azure Databricks Database Features

The Federal Science DataHub offers multiple ways to use database functionality. This guide will compare the use of a PostgreSQL database on FSDH with the database features provided by Azure Databricks.

Introduction

When it comes to hosting a database with FSDH, there are multiple options available. One popular choice is using PostgreSQL, while another option is leveraging the database features provided by Azure Databricks. In this comparison, we will explore the key differences and benefits of each approach.

PostgreSQL

PostgreSQL is a fully managed database service that allows you to deploy PostgreSQL databases in the cloud. It provides high availability, automatic backups, and integration with other services. Some of the key features of PostgreSQL include:

  • Easy deployment of PostgreSQL databases
  • Built-in high availability and automatic backups
  • Support for advanced PostgreSQL features such as JSONB, full-text search, and spatial data types
  • Seamless integration with other services like web apps or APIs

Azure Databricks Database Features (Hive)

Azure Databricks is an analytics platform that combines Apache Spark with a collaborative environment for data science and machine learning. While it is primarily known for its big data processing capabilities, Azure Databricks also provides powerful database features. Some of the key database features offered by Azure Databricks include:

  • Delta Lake: A transactional storage layer that provides ACID transactions, schema enforcement, and data versioning for data lakes.
  • Delta Engine: A high-performance query engine optimized for Delta Lake, enabling fast and scalable SQL queries on large datasets.
  • Delta Live Tables: A real-time change data capture (CDC) feature that allows you to capture and process real-time data changes from various sources.
  • Data Lakehouse Architecture: Azure Databricks promotes a unified architecture that combines the best of data lakes and data warehouses, enabling efficient data storage, processing, and analytics.

Databricks leverages the standard Apache Hive project which enables to leverage standard storage and query data using SQL. The underlying model is quite different from a traditional database.

  • Databricks SQL requires the cluster to be enabled to run SQL queries
  • Large datasets can be easily accomodated by simply scaling the cluster size
  • There is only storage costs for data that is not used
  • Large datasets are cost efficient because standard Azure storage is used.

Comparison

This chart outlines the key differences between PostgreSQL and Azure Databricks Database Features:

FeaturePostgreSQLAzure Databricks Database Features (Hive Tables)
Database ModelTraditional Relational DatabaseUnified data analytics platform based on Lakehouse architecture
Primary Use CaseGeneral-purpose database. Web app / API integration. Complex SQL queries.Big data analytics and real-time processing. Machine Learning. Collaborative data exploration.
SchemaStrict traditional SQL SchemaFlexible and auto-schema management
PerformanceGood for OLTP workloads.Optimized for big data processing.
ScalabilityLimited scaling in current setupHorizontal scaling with Delta Engine.
IntegrationSeamless integration with web apps and APIs.Integration with big data processing and machine learning workflows.

Recommended use cases

PostgreSQL

  • Web Application
  • Interactive Form
  • API integration
  • Connection to existing application

Databricks SQL Hive Tables

  • Databricks Dashboards
  • Science experiment data
  • Evolving data that requires flexible schemas
  • ETL pipeline intermediary data

Costing Difference

On-Demand Databricks

  • Databricks costing is pay for what you use aka it depends on the cluster size and how long the cluster has been running
  • With light usage, it is possible to keep databricks under $100 in a month and that includes Hive tables
  • It is recommended to perform some small experiments and monitor the costs to precisely estimate the cost

PostgreSQL Cost

  • PostgreSQL in FSDH uses the Azure Flexible Server model which is also based on consumption (burstable model)
  • With very light usage, it costs about $30/month
  • It is recommended to perform experiments as well and monitor the cost to determine more precisely the cost for a specific application

Connectivity

FeaturePostgreSQLAzure Databricks Database Features
Connect from Web ApplicationXNot possible
Connect from Power BIXX
Connect from TableauXX
Connect from PythonXPossible but requires cluster to be on

Conclusion

In conclusion, PostgreSQL and Azure Databricks Database Features offer different strengths and use cases. PostgreSQL is well-suited for general-purpose database workloads, web app integration, and complex SQL queries. On the other hand, Azure Databricks database features are optimized for big data analytics, real-time processing, and collaborative data exploration. Depending on your specific use case and requirements, you can choose the database solution that best fits your needs.

Edit this page on GitHub
Last Updated: 2026-04-13, 11:39 a.m.
Previous
Add a User to PostgreSQL on FSDH