- BI_Retail Project
The BI_Retail project is a comprehensive Business Intelligence solution designed to analyze and visualize retail supply chain and sales data. It leverages Docker for environment setup, Apache Airflow for workflow management, PostgreSQL as the data warehouse, and PowerBI for data visualization.
Here is our datawherehouse schema :

Before setting up the project, ensure you have the following installed on your Linux system:
- Docker: For containerization.
- Docker Compose: To manage multi-container Docker applications.
- Python 3.8+: For running Python scripts.
- Git: To clone the repository.
- PowerBI Desktop: For creating dashboards (available on Windows).
-
Clone the Repository
git clone https://github.com/Amiche02/BI_Retail.git cd BI_Retail -
Configure Environment Variables
-
Navigate to the
dockerdirectory and copy the example.envfile:cp docker/.env.example docker/.env
-
Generate Airflow Fernet Key
Run the following Python script to generate a Fernet key for Airflow:
python3 docker/airflowkey.py
-
Copy the generated key.
-
Open the
.envfile and paste the key into theAIRFLOW_FERNET_KEYvariable:AIRFLOW_FERNET_KEY=your_generated_fernet_key
-
-
Adjust Other Environment Variables
Ensure that the
DB_PORTin.envdoes not conflict with existing services on your machine.
-
Before deploying the environment, generate the necessary datasets.
-
Navigate to the Data Sources Directory
cd data_sources -
Generate Supply Chain Data
python3 supply_chain.py
- Output:
supply_chain.csvandsupply_chain_updated.csv - Location:
data_sources/
- Output:
-
Generate Transactions Data
python3 transaction.py
- Output:
transactions.csvandsupply_chain_updated.csv - Location:
data_sources/
Alternatively, for faster transaction generation:
python3 fast_transactions.py
- Note: Choose either
transaction.pyorfast_transactions.pybased on your performance needs.
- Output:
-
Place Generated Data
Ensure that the generated CSV files are placed in the correct directories as expected by the ETL scripts. By default, the paths are:
data_sources/data/supply_chain_updated.csvdata_sources/data/transactions.csv
If you choose different paths, update the paths in
scripts/etl_to_dwh.pyaccordingly.
-
Start Docker Containers
From the root of the project directory, run:
sh up.sh
- What It Does:
- Builds and starts Docker containers for PostgreSQL and Airflow.
- Sets up necessary directories and volumes.
- What It Does:
-
Accessing Airflow Worker (If DAGs Are Not Working)
If your DAGs are not functioning as expected, access the Airflow worker container:
docker exec -it docker-airflow-worker-1 bash- Within the Container:
- Connect to the PostgreSQL database using
psqlor another PostgreSQL client. - Alternatively, use pgAdmin4 for a graphical interface.
- Connect to the PostgreSQL database using
- Within the Container:
-
Connect to the PostgreSQL Database
- Using pgAdmin4:
-
Open pgAdmin4.
-
Create a new server connection with the following details from
.env:Host: localhost Port: 5433 Username: airflow Password: postgres
-
- Using pgAdmin4:
-
Create the Data Warehouse Schema
From the host machine, run:
python3 scripts/datawarehouse.py
- What It Does:
- Connects to PostgreSQL.
- Creates the
AGORA_DATA_WHEREHOUSEdatabase if it doesn't exist. - Drops existing tables and recreates the necessary schema.
- What It Does:
After setting up the data warehouse, perform the ETL (Extract, Transform, Load) process to populate the data warehouse with transformed data.
-
Run the ETL Script
python3 scripts/etl_to_dwh.py
- What It Does:
- Reads data from
supply_chain_updated.csvandtransactions.csv. - Cleans and transforms the data using Spark.
- Loads the transformed data into the PostgreSQL data warehouse.
- Reads data from
- What It Does:
Once the data is loaded into PostgreSQL, you can connect PowerBI to the data warehouse for analysis and dashboard creation.
-
Open PowerBI Desktop
-
Get Data
-
Select PostgreSQL from the data sources.
-
Enter the connection details:
Server: localhost,5433 Database: AGORA_DATA_WHEREHOUSE Username: airflow Password: postgres
-
-
Import Tables
- Choose the necessary dimension and fact tables for your analysis.
-
Create Dashboards
- Utilize the imported data to build interactive dashboards as described below.
Description:
This dashboard highlights the logistical and operational performance of the supply chain. Key components include:
-
Key Performance Indicators (KPIs):
- Sum of StockLevel: Total quantity of products in stock.
- Sum of ShippingCosts: Total cost of shipments.
- Sum of RevenueGenerated: Revenue generated from sales.
- Sum of OrderQuantities: Total quantity of products ordered.
-
Visualizations:
- Bar Chart - Revenue par produit (ProductName): Displays revenue generated for each product.
- Pie Chart - Disponibilité par pays fournisseur (SupplierCountry): Shows distribution of availability by supplier country.
- Bar Chart - Taux de défauts par mode de transport (TransportationModes): Indicates defect rates by transportation mode.
- Bar Chart - Coût moyen par mode de transport (TransportTypeAVGCost): Shows average shipping cost by transportation mode.
-
Dynamic Filters:
- Year and Month: Filter data based on specific periods.
- Product Categories: Filter by specific product types.
Description:
This dashboard focuses on sales analysis and customer behavior to identify trends and store performance. Key components include:
-
Key Performance Indicators (KPIs):
- Total Sales and Profits Generated.
- Frequency of Payment Methods Used.
-
Visualizations:
- Pie Chart - Taux d'achat par genre (Gender): Shows the distribution of purchases between males and females.
- Bar Chart - Ventes par produit (PrixVente): Displays the best-selling products.
- Donut Chart - Profit par ville (City): Compares profits across different cities.
- Bar Chart - Prix de vente par pays (Country): Highlights revenue by country.
- Bar Chart - Fréquence des méthodes de paiement par magasin: Analyzes payment methods used in each store.
-
Dynamic Filters:
- Year and Month: View sales over specific periods.
- Country: Filter data by specific countries.
- Product Categories: View performance by product category.
-
Airflow DAGs Not Running
-
Access the Airflow worker container:
docker exec -it docker-airflow-worker-1 bash -
Check Airflow logs for errors:
tail -f /opt/airflow/logs/scheduler/latest/*.log
-
-
Database Connection Issues
-
Ensure PostgreSQL is running:
docker ps
-
Verify connection details in
.envfile.
-
-
ETL Script Errors
- Check logs generated by the ETL scripts.
- Ensure data files are correctly placed and paths are updated in
etl_to_dwh.py.
BI_Retail/
├── docker/
│ ├── Volumes/
│ ├── airflowkey.py
│ ├── .env.example
│ └── ...
├── scripts/
│ ├── etl_to_dwh.py
│ ├── datawarehouse.py
│ └── ...
├── data_sources/
│ ├── data/
│ │ ├── supply_chain.csv
│ │ ├── supply_chain_updated.csv
│ │ └── transactions.csv
│ ├── supply_chain.py
│ ├── transaction.py
│ └── fast_transactions.py
├── docs/
│ ├── Supply_chain.png
│ ├── Ventes.png
│ └── logo.png
├── up.sh
├── down.sh
└── README.md
This project is licensed under the MIT License.
For any questions or support, please contact projectsengineer6@gmail.com.

