GA4 데이터를 Airflow, dbt, BigQuery, Superset으로 처리하고 시각화하기 위한 데이터 파이프라인 프로젝트입니다.
이 프로젝트는 Docker Compose 기반으로 실행됩니다.
GA4 data
-> BigQuery raw data
-> Airflow orchestration
-> dbt bronze/silver/gold modeling
-> Superset dashboards
운영 환경에서는 AWS Lightsail 서버에 배포합니다. Tailscale은 동아리원 전체 대시보드 조회용이 아니라, 운영진의 SSH 접속과 서버 관리를 위한 접근 수단입니다.
운영진
-> Tailscale
-> Lightsail SSH 접속
-> Docker / Airflow / Superset / Postgres 관리
동아리원
-> 브라우저
-> Superset HTTPS 주소 접속
Browser
-> https://superset.bcsdlab.com
-> Caddy reverse proxy
-> Superset:8088
-> BigQuery / Postgres
현재 DNS 연결 전에는 운영진이 SSH 터널로 Superset을 확인합니다.
ssh -L 8089:127.0.0.1:8088 ubuntu@100.80.218.102터널을 열어둔 상태에서 브라우저로 접속합니다.
http://127.0.0.1:8089
| Service | Purpose | Exposure |
|---|---|---|
| Caddy | HTTPS reverse proxy | Public 80/443 |
| Superset | Dashboard web app | Host localhost only |
| Airflow webserver | DAG/API management | Host localhost only |
| Airflow scheduler | DAG scheduling | Docker network |
| Postgres | Airflow/Superset metadata DB | Docker network |
| dbt | Data modeling commands | Run on demand |
| Port | Service | Production exposure |
|---|---|---|
| 80 | Caddy HTTP | Public |
| 443 | Caddy HTTPS | Public |
| 8088 | Superset | Not public |
| 8080 | Airflow | Not public |
| 5432 | Postgres | Not public |
Superset and Airflow are bound to 127.0.0.1 on the server.
Postgres is only reachable inside the Docker Compose network.
KOIN_DATA/
README.md
.env.example
docker-compose.yml
airflow/
dags/
logs/
plugins/
dbt/
docker/
airflow/
caddy/
postgres/
superset/
docs/
architecture.md
deploy-lightsail.md
docker-compose.md
operations.md
scripts/
backup_postgres.sh
init_airflow.sh
secrets/
Create the environment file.
cp .env.example .envUpdate the required values in .env.
KOIN_DATA_SUPERSET_DOMAIN=superset.bcsdlab.com
KOIN_DATA_POSTGRES_PASSWORD=replace-with-strong-password
SUPERSET_SECRET_KEY=replace-with-strong-secret
AIRFLOW__WEBSERVER__SECRET_KEY=replace-with-strong-secret
SUPERSET_ADMIN_PASSWORD=replace-with-strong-passwordStart the stack.
docker compose up -d --buildInitialize Airflow metadata DB if needed.
./scripts/init_airflow.shCheck service status.
docker compose psThe project is deployed on the Lightsail server under:
/home/ubuntu/KOIN_DATA
Server access:
ssh ubuntu@100.80.218.102Production DNS should point to the Lightsail public IP.
superset.bcsdlab.com -> 3.39.229.133
After DNS is connected, Caddy automatically issues and renews HTTPS certificates.
More details:
docs/deploy-lightsail.md
docs/operations.md
docker compose ps
docker compose logs -f caddy
docker compose logs -f superset
docker compose logs -f airflow-webserver
docker compose run --rm dbt debug
docker compose run --rm dbt run
./scripts/backup_postgres.sh- Do not commit
.env. - Do not commit files in
secrets/. - Do not expose Superset
8088directly to the internet. - Do not expose Airflow
8080directly to the internet. - Do not expose Postgres
5432directly to the internet. - Give club members viewer-level Superset permissions only.
- Use Public Dashboard only for data that can be visible to anyone.