Azure Databricks Integration

This repository demonstrates a structured approach to integrating Azure Databricks with Azure Data Lake Storage Gen2 and Azure SQL Database. It outlines a data pipeline that ingests raw data, processes it through bronze and silver layers, and finally writes the refined data to an Azure SQL Database.

Project Structure

datasets/: Contains sample datasets for demonstration purposes.
1.mount_adls_gen2.py: Script to mount Azure Data Lake Storage Gen2 to Databricks.
2.create_databases.sql: SQL script to create necessary databases within Databricks.
3.bronze_layer_table.py: Processes and stores raw data into the bronze layer.
4.silver_layer_table.py: Transforms bronze data and stores it into the silver layer.
5.silver_to_sql_database.py: Transfers data from the silver layer to an Azure SQL Database.

Prerequisites

Azure Subscription
Azure Databricks Workspace
Azure Data Lake Storage Gen2
Azure SQL Database
Databricks UI access
Required credentials and access rights for mounting and DB connections

Setup Instructions

1. Mount Azure Data Lake Storage Gen2

Update 1.mount_adls_gen2.py with:
- Storage account name
- Container name
- Client ID, Tenant ID, and Secret
Run the script in Databricks to mount the storage.

2. Create Databases

Run 2.create_databases.sql in a SQL notebook to create bronze_db and silver_db.

3. Bronze Layer - Raw Ingestion

Run 3.bronze_layer_table.py to load raw files from ADLS into a bronze table.

4. Silver Layer - Data Cleaning/Transformation

Run 4.silver_layer_table.py to apply transformations and store clean data in the silver table.

5. Export to Azure SQL Database

Update 5.silver_to_sql_database.py with your JDBC connection string, credentials, and target table info.
Run the script to push data from the silver layer to Azure SQL DB.

6. Use PowerBI to create visualisations

Ingest data from Azure SQL DB in PowerBI and create visualisations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure Databricks Integration

Project Structure

Prerequisites

Setup Instructions

1. Mount Azure Data Lake Storage Gen2

2. Create Databases

3. Bronze Layer - Raw Ingestion

4. Silver Layer - Data Cleaning/Transformation

5. Export to Azure SQL Database

6. Use PowerBI to create visualisations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
datasets		datasets
images		images
1.mount_adls_gen2.py		1.mount_adls_gen2.py
2.create_databases.sql		2.create_databases.sql
3.bronze_layer_table.py		3.bronze_layer_table.py
4.silver_layer_table.py		4.silver_layer_table.py
5.silver_to_sql_database.py		5.silver_to_sql_database.py
Readme.md		Readme.md

Folders and files

Latest commit

History

Repository files navigation

Azure Databricks Integration

Project Structure

Prerequisites

Setup Instructions

1. Mount Azure Data Lake Storage Gen2

2. Create Databases

3. Bronze Layer - Raw Ingestion

4. Silver Layer - Data Cleaning/Transformation

5. Export to Azure SQL Database

6. Use PowerBI to create visualisations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages