In my previous blog post, we explored how visualization with Apache Superset can enhance business revenue and operations, and provided a brief introduction to Azure Databricks.
In this blog we will go through how to connect to Databricks (we will using Azure cloud), and create charts and dashboards using Apache Superset.
Setup Dataset and Azure Databricks
Dataset –
The dataset used in this demo, can be downloaded from Kaggle.
Prerequisites –
It is assumed that Azure Databricks is already set up and a database is created. If not, follow these steps: –
- Create an Azure Databricks Workspace.
- Go to the Azure portal.
- Click on “Create a resource” and search for “Azure Databricks”.
- Click “Create” and follow the instructions to set up your Databricks workspace.
- Create a Databricks Cluster
- Launch Databricks Workspace:
- Once the workspace is created, launch it from the Azure portal.
- Sign in to the Databricks workspace.
- Create a Cluster:
- In the Databricks workspace, go to the “Clusters” section.
- Click “Create Cluster”.
- Configure your cluster (e.g., cluster name, cluster mode, and autoscaling options).
- Click “Create Cluster” and wait for it to start.
- Launch Databricks Workspace:
- The dataset is ingested into Azure Databricks and stored in Delta format.
- Generate a Personal Access Token
- Create a Token:
- In the Databricks workspace, click on your username in the top right corner and select “User Settings”.
- Go to the “Access Tokens” tab.
- Click “Generate New Token”.
- Give the token a name and set an expiration period.
- Click “Generate” and copy the token. Save it somewhere secure as you won’t be able to see it again.
- Create a Token:
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_238,h_370/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-27.png)
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_1024,h_449/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-28-1024x449.png)
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_1024,h_253/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-29-1024x253.png)
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_508,h_294/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-30.png)
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_498,h_235/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-32.png)
A successful ingesting and creation of delta tables will look similar to the below screenshot
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_933,h_258/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-26.png)
Setup Apache Superset
Let’s configure Databricks connection in Superset.
- Add a New Database:
- Open your Superset instance in a web browser.
- Log in and navigate to “Data” > “Databases” > “+ Database”.
- Choose “Databricks” as the database type.
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_1024,h_667/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-33-1024x667.png)
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_1024,h_667/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-34-1024x667.png)
Navigate to Azure Databricks’s cluster and under the Configuration tab, expand the Advanced options. The below details are needed to configure the connections in Apache Superset, along with the Personnel Access Token created from the previous section.
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_640,h_820/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-35.png)
After entering the appropriate details in Apache Superset, click on Connect.
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_501,h_851/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-36.png)
Click on Finish to accept the default parameters –
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_498,h_835/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-37.png)
Select Datasets from the tab –
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_860,h_123/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-38.png)
Select the database, schema and table on the left hand side, and automatically the schema will be loaded on the right hand side.
Once you have confirmed the table, select “Create Dataset and Create Chart” at the right hand bottom button.
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_1024,h_725/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-39-1024x725.png)
The next screen will take you to Create a new chart. Select the dataset that was created in the previous step, and start creating some simple charts to begin with.
For simplicity, I have created the total sales amount using the Big Number chart. You can explore other type of Charts based on your requirement.
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_1024,h_666/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-40-1024x666.png)
Once satisfied with the chart, save the chart. If a dashboard doesn’t exists create a dashboard as well.
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_1024,h_457/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-41-1024x457.png)
After adding so many charts to the dashboard, the dashboard looks like below –
![](https://sp-ao.shortpixel.ai/client/to_webp,q_glossy,ret_img,w_1024,h_301/https://www.dominickumar.com/blog/wp-content/uploads/2024/06/image-42-1024x301.png)
Conclusion
Using Apache Superset with Databricks offers a powerful combination of data visualization and business intelligence. Here are some key points to consider in the conclusion of using these tools together:
- Enhanced Data Exploration:
- Apache Superset provides a rich set of visualization options and a user-friendly interface, enabling users to explore and analyze data interactively.
- When connected to Databricks, it leverages the high performance and scalability of the Databricks platform, allowing for seamless analysis of large datasets.
- Scalability and Performance:
- Databricks, built on Apache Spark, provides a robust and scalable environment for processing large volumes of data. This ensures that even complex queries and large datasets can be handled efficiently.
- Superset’s integration with Databricks ensures that visualizations and dashboards are updated in real-time, providing up-to-date insights.
- Ease of Integration:
- Connecting Apache Superset with Databricks is straightforward, often involving configuration steps that connect Superset to Databricks using JDBC or other supported connectors.
- This integration allows users to harness Databricks’ advanced analytics capabilities within the intuitive Superset interface.
- Advanced Analytics:
- Databricks supports advanced analytics, including machine learning and streaming analytics. By visualizing these analytics in Superset, users can gain deeper insights and make data-driven decisions.
- The combination enables the creation of complex, interactive dashboards that can display real-time analytics and predictions.
- Collaborative Environment:
- Both Databricks and Apache Superset support collaborative work. Databricks offers collaborative notebooks, while Superset allows for sharing and editing dashboards among team members.
- This collaboration fosters a data-driven culture within organizations, enabling teams to work together on data analysis and visualization projects.
- Cost Efficiency:
- Utilizing Databricks for heavy data lifting and Superset for visualization can be cost-effective. Databricks’ optimized compute resources can handle large-scale data processing efficiently, while Superset provides a free, open-source solution for visualizing that data.
- Open Source and Flexibility:
- Both Apache Superset and Databricks are built on open-source technologies, offering flexibility and a broad range of customization options. This openness ensures that organizations can adapt and extend the tools to fit their specific needs.
In conclusion, the integration of Apache Superset with Databricks creates a powerful, scalable, and user-friendly platform for data visualization and analytics. This combination leverages the strengths of both tools, providing enhanced capabilities for data exploration, advanced analytics, and collaborative work, all while maintaining cost efficiency and flexibility.