Platform Architecture
The Qualiz platform is a cloud-agnostic, Kubernetes-based, microservices-driven ETL and Data Quality solution.
It is designed to orchestrate data ingestion, transformation, quality checks, deduplication, enrichment, and AI-assisted data operations at scale.
The platform is extensible, highly available, and supports multi-tenancy through project-level isolation and role-based access control.

¶
Architectural Goals¶
- Scalability – Handle multiple large-scale pipelines concurrently.
- Extensibility – Add new operators and processing engines without service downtime.
- Observability – Full visibility into pipeline execution, audit logs, and system health.
- Security – Strong identity management, RBAC, and secure secrets handling.
- Portability – Deployable on any major cloud provider or on-prem Kubernetes cluster.
- AI Integration – Native AI support for cleansing rule generation and job monitoring.
High-Level Architecture¶
The platform consists of four main layers:
1. Presentation Layer¶
- Webapp (React) – Provides the UI for pipeline creation, data source configuration, monitoring, and audit viewing.
- Authentication (Keycloak) – Central identity provider using OIDC for both UI and service-to-service authentication.
2. Application Layer (Microservices)¶
- Backend API – Core orchestration API for pipelines, DAG generation, Airbyte integrations, and audit management.
- AI API – Interfaces with Ollama for AI-assisted cleansing rules and anomaly detection.
- Notification Service – Sends email and planned webhook/slack alerts.
- Custom Operator Logic – Encapsulated in microservices or container images for specific ETL tasks.
3. Processing & Orchestration Layer¶
- Airflow – Primary workflow orchestrator; dynamically executes DAGs generated from the UI.
- Airbyte – Manages data ingestion from various sources to destinations (triggered via Backend API).
- Apache Beam – Cluster-based execution for distributed data processing.
- Custom Operators – Python task runner, SQL task runner, cleansing, deduplication, notification, sub-job invoker.
4. Data & Storage Layer¶
- PostgreSQL – Metadata store for pipelines, configurations, audit logs, and lineage data.
- MinIO – S3-compatible object store for staging data, artifacts, and logs.
- ELK Stack – Centralized logging and search capabilities.
- Prometheus/Grafana – Metrics collection and dashboarding.
Key Platform Components¶
| Component | Responsibility |
|---|---|
| Webapp | Pipeline builder UI, monitoring, audit dashboards. |
| Backend API | Pipeline orchestration, DAG generation, Airbyte integration, audit logging. |
| AI API | AI inference for cleansing rules & job monitoring (Ollama models). |
| Airbyte | Connector management for data ingestion. |
| Airflow | DAG scheduling, execution, and task orchestration. |
| Custom Operators | Encapsulate business-specific ETL & quality checks. |
| PostgreSQL | Metadata, pipeline definitions, audit logs. |
| MinIO | Object storage for intermediate and final artifacts. |
| ELK | Log ingestion, search, and visualization. |
| Keycloak | Authentication & authorization provider. |
Deployment & Infrastructure¶
- Runtime: Kubernetes cluster (cloud or on-prem).
- Namespace Segregation:
platform– Core microservices (backend, AI, Airflow, Airbyte, Keycloak).infra– Storage, ingress, logging, monitoring.tenant-*– Optional per-tenant connector runtime.- Ingress: NGINX/Traefik with TLS termination (cert-manager).
- Storage Classes: Block storage for DBs, object storage for MinIO.
Data Flow¶
- Pipeline Creation – User designs pipeline in UI → backend stores config in PostgreSQL → generates DAG file.
- Pipeline Execution – Airflow picks DAG → executes operators (Airbyte, cleansing, deduplication, SQL, Python, Beam).
- Ingestion – Airbyte connectors run in Kubernetes pods → write output to MinIO or target DB.
- Processing – Custom tasks transform, enrich, deduplicate data.
- AI Features – Backend calls AI API for suggestions/monitoring → results stored in DB.
- Audit & Monitoring – Logs and execution metadata stored in PostgreSQL and ELK → UI displays results.
- Notification – Email/slack/webhook alerts on pipeline events.
Security & Access Control¶
- Identity – Managed via Keycloak OIDC.
- Authorization – Role-based access control (RBAC) with project-level scopes.
- Secrets – Stored in Kubernetes Secrets or Vault (encrypted at rest).
- Network Policies – Restrict inter-service communication.
Observability¶
- Logs – Collected via ELK stack (Elasticsearch, Logstash/Fluentd, Kibana).
- Metrics – Prometheus exporters for Airflow, Airbyte, custom services.
- Audit – Job & task-level immutable records.
Scalability & Resilience¶
- Task Execution Scalability
The platform is designed to handle heavy data processing workloads by enabling scalable execution of ETL tasks: - Apache Beam Integration:
Provides a distributed, cluster-based execution environment for Python tasks and complex data transformations.- Supports multiple runners including Apache Flink and Apache Spark clusters, enabling flexible execution depending on workload and environment.
- Runs pipelines with parallelism, windowing, and fault tolerance critical for large streaming and batch data workloads.
- Enables horizontal scaling by leveraging the underlying cluster’s autoscaling and resource management features.
- Airflow KubernetesExecutor:
Each ETL task runs in its own Kubernetes pod, allowing dynamic scaling of tasks in parallel based on cluster resources. -
Airbyte Connectors:
Data ingestion connectors run as independent Kubernetes jobs or pods, which can be scaled out for parallel data pulls/pushes. -
Fault Tolerance & Resilience
- Apache Beam runners provide exactly-once or at-least-once guarantees depending on runner and pipeline configuration.
- Airflow’s task retries and DAG-level error handling ensure pipeline resilience.
-
MinIO and PostgreSQL provide highly available and durable storage for intermediate data and metadata.
-
Resource Efficiency
- The platform supports workload-specific resource requests and limits for pods to optimize cluster resource usage.
- Apache Beam pipelines leverage autoscaling capabilities of underlying runners to dynamically adjust compute resources during execution.