In today’s data-driven world, businesses are constantly seeking ways to harness data to gain insights and drive growth. Apache Superset, an open-source data exploration and visualization platform, offers powerful tools to create dynamic business KPI dashboards that can significantly impact revenue growth.
What is Apache Superset?
Apache Superset is an open-source, modern, enterprise-ready business intelligence (BI) web application designed for data exploration and visualization. It is highly flexible, allowing users to create a wide range of visualizations, from simple line charts to complex geospatial charts, without needing to write code.
Apache Superset is fast, lightweight, and capable of handling large-scale data environments, making it suitable for businesses of all sizes.
How Does Apache Superset Work?
Superset connects to databases supported by the current release, enabling users to query data directly from these sources. It features a no-code visualization builder for users less familiar with coding, alongside a powerful SQL editor for advanced users.
The platform’s architecture supports scalability and integrates seamlessly with modern data stacks, ensuring that businesses can leverage their existing data infrastructure without additional complexity.
Benefits of Using Apache Superset for Business KPI Dashboards
- Comprehensive Visualization Options: Superset comes with over 40 pre-installed visualization types, allowing businesses to tailor their dashboards to specific needs. This versatility ensures that KPIs can be presented in the most effective format for decision-makers.
- Ease of Use: The platform’s intuitive interface and drag-and-drop capabilities make it accessible to users of all technical skill levels. This democratization of data allows more team members to engage with and interpret data.
- Real-time Data Insights: By connecting directly to live databases, Superset enables real-time data visualization. This immediacy is crucial for businesses needing to make quick, informed decisions.
- Customization and Scalability: Businesses can customize dashboards with CSS templates to align with their brand aesthetics. The platform’s cloud-native architecture supports scalability, accommodating growing data needs.
Apache Superset Reference Architecture
The easiest way to install Superset is through Docker. You can use the official Docker image and Docker Compose files to set up the environment quickly.
In this blog, we will primarily focus on Azure-based workflow. But, the same principle can be used in other major cloud providers such as AWS and GCP.
(1) The software developer pulls the base image from the docker hub.
(2) The software developer based on the connectors modifies the dockerfile and deploys the code to the Azure Repos.
(3) The Azure Pipeline based on automatic or manual triggers builds the images, along with post-scripts executed to bring Apache Superset up to the “running” state and creates a custom image.
(4) The custom image is pushed to Azure Container Registry.
(5) Once the custom image is created, Azure Pipeline deploys the custom-built image to Azure Container Instances.
(6) When new changes in the build process, the Azure pipeline maintains stable code and image between Azure Container Registry and Azure Container Instances.
(7) After the successful build of the custom Apache Superset image, it can be accessed via a web browser.
Databricks and Apache Superset
Databricks is a unified data analytics platform, founded by the creators of Apache Spark, that aims to simplify the process of data engineering, data science, and machine learning. By providing an integrated workspace for data preparation, analysis, and visualization, Databricks allows organizations to harness the full potential of their data to drive innovation and business growth.
In this blog post, we will use Databricks as the data source and Apache Superset as the presentation layer.
For simplicity, we will use Docker Desktop instead of the extensive Azure CI/CD build process. This will show a detailed step-by-step process of installing the Databricks connector and connecting to the Databricks cluster.
Create a Dockerfile. The below Dockerfile will install the required packages for Databricks.
# Docker image
FROM apache/superset
# Switching to root to install the required packages
USER root
#Install Databricks packages using pip
RUN pip install "apache-superset[databricks]"
# Switching back to using the `superset` user
USER superset
Build the image
docker build -t mysuperset .
Start the container, using a SUPERSET_SECRET_KEY
docker run -d -p 8080:8088 -e "SUPERSET_SECRET_KEY=test1234" --name mysuperset mysuperset
Create a super username, password, and email address
docker exec -it mysuperset superset fab create-admin --username admin --firstname Superset --lastname Admin --email abc@xyz.com --password admin
Upgrade the Apache superset
docker exec -it mysuperset superset db upgrade
Apache Superset has an inbuilt example. It’s best to load the examples for our learning curve.
docker exec -it mysuperset superset load_examples
Finally, initialize the container
docker exec -it mysuperset superset init
Open a web browser, and visit
http://127.0.0.1:8080/login/
If the container build was successful, a login page would be rendered. Log in with the username and password we created during the container setup. In this case it is admin/admin.
A successful, landing page with examples would resemble below –
Open the demo – Sales Dashboard
In the upcoming Part-2, we look at how to connect to Azure Databricks and create simple dashboards.
Conclusion
Apache Superset is a robust and scalable platform for data visualization and exploration, making it an excellent choice for businesses looking to enhance their data analysis capabilities. By understanding its architecture and effectively implementing it within your infrastructure, you can leverage its powerful features to drive data-driven decisions and revenue growth.