Platform Engineer
Start Date: 03/08/2026
End Date: 31/12/2027
Location: Brussels (2 days onsite per week)
Work Regime: Full-time
Role Summary:
The Platform Engineer is responsible for the reliable, secure, and stable operation of a high-availability cloud platform, built on Kubernetes and composed of multiple in-house platform components. The role focuses on platform lifecycle management, day-2 operations, incident response, and operational excellence, ensuring that customer-facing Web UIs and APIs remain available, performant, and secure 24/7. The Platform Engineer acts as a technical custodian of the platform, providing a stable foundation on which service teams can safely deploy and operate their workloads.
Primary Objectives:
- Maintain platform availability and reliability in accordance with SLOs/SLAs.
- Ensure operational readiness of all environments (DEV / TEST / ACC / PROD).
- Provide 24/7 operational coverage for critical platform services (via on-call).
- Ensure the platform is observable, secure, well-controlled, and documented.
- Execute platform changes, upgrades, and maintenance in a predictable and low-risk manner.
Key Responsibilities:
Kubernetes & Runtime Operations
- Operate Kubernetes primitives and platform add-ons: Ingress controllers, service discovery, workload identity.
- Troubleshoot Kubernetes-related failures: Pod lifecycle issues, networking problems, resource starvation.
- Controlled rollouts with rollback plans.
Reliability & 24/7 Incident Response
- Participate in the 24/7 on-call rotation for critical services (incident responder).
- Lead or contribute to incident triage and mitigation, Root Cause Analysis (RCA), and post-incident action tracking and follow-up.
- Maintain and improve runbooks and operational procedures.
Observability & Monitoring
- Operate and use the open-source observability platform.
- Ensure effective observability across the platform: Metrics, logs, and distributed traces.
- Actionable alerts and reduced false positives.
- Support incident analysis through correlation and telemetry inspection.
Change, Release & Maintenance Management
- Plan and execute platform changes.
- Follow structured change management practices.
- Stakeholder communication.
- Ensure platform changes are documented and auditable.
Security & Compliance (Operational Focus)
- Operate platform security controls: RBAC, network boundaries, secret management.
- Apply security updates and patches to platform components.
- Support vulnerability remediation efforts.
- Provide operational evidence for audits and security reviews.
Automation & Operational Improvement
- Automate repetitive operational tasks where appropriate.
- Reduce operational risk through standardization and documented procedures.
- Platform as Code approach (GitOps).
Requirements:
Technical Skills
- Kubernetes (Deep Production Expertise)
- Multi-cluster architecture & lifecycle management
- RBAC & least-privilege design
- Network policies & traffic segmentation
- Stateful workloads & storage strategy (CSI, PV/PVC)
- Autoscaling (HPA/VPA) & resource tuning
- Pod Security Standards
- Admission controllers
- Performance & reliability troubleshooting
- Cluster-level debugging (networking, DNS, scheduling, OOM, crash loops)
GitOps & Continuous Delivery
- ArgoCD (advanced usage)
- App-of-Apps pattern
- Sync waves & hooks
- Drift detection & reconciliation
- Multi-environment promotion workflows
- Git-based deployment strategy with version management
- Declarative platform design with PR-driven changes
- YAML-based CI/CD pipelines with Harness.io
- Secure secret handling in CI/CD (with HashiCorp)
Packaging & Configuration
- Helm (advanced chart authoring)
- Reusable library charts
- OCI-based registries
- Values layering strategy
- Kustomize overlays for multi-environment isolation and strategic patches
Container & Artifact Management
- Docker (secure multi-stage builds, optimization)
- Harbor (RBAC, replication, vulnerability scanning)
- JFrog Artifactory (Docker & Helm registry management)
- Artifact versioning & promotion strategy
Secrets & Security
- HashiCorp Vault for dynamic secrets with CSI integration
- Image vulnerability scanning integration
- Supply chain security awareness
- TLS & certificate lifecycle management
- RBAC governance
Observability & Reliability
- OpenTelemetry (metrics, logs, traces)
- Prometheus or VictoriaMetrics (recording rules, HA setup)
- Loki (log aggregation & LogQL)
- Tempo (distributed tracing)
- Grafana (advanced dashboards & alerting)
- SLI/SLO design & error budget thinking
- Alert noise reduction strategy
Networking (Advanced)
- TCP/IP & DNS fundamentals
- TLS & mTLS concepts
- Kubernetes Services, Ingress & Reverse Proxy concepts
- East-west vs north-south traffic
- API routing & traffic management
- Network Policies implementation
Automation
- Advanced Bash scripting
- Infrastructure automation mindset
Nice to Have
- Kong API Gateway (api routing, plugins, authentication, rate limiting)
- Redis (operational knowledge: deployment, persistence, clustering, backups)
- PostgreSQL (migrations, backups, HA basics, Kubernetes deployment patterns)
- MongoDB (replica sets, backups, Kubernetes deployment patterns)
- Kargo on top of ArgoCD for release orchestration
Operational Skills
- Proven experience in production operations or platform support roles
- Ability to work calmly and methodically under pressure
- Strong troubleshooting skills across distributed systems
- Clear written and verbal communication during incidents and changes
- Flexibility to balance daily operations with long-term changes
Ways of Working
- Structured, risk-aware, and detail-oriented
- Comfortable with operational responsibility and accountability
- Strong collaboration with Development teams, Security teams, Product teams
- Documentation-first mindset for operational knowledge
Positioning vs Other Roles:
- Not a pure SRE role: focus is stability and operations, not reliability engineering.
- Not a pure DevOps engineer embedded in product teams.
- The role is the operational owner of the platform, in all environments, ensuring they run safely and predictably.