logo

Bekijk alle vacatures

DevOps/Platform Engineer

Brussel, Brussel

Platform Engineer 

Start Date: 03/08/2026

End Date: 31/12/2027

Location: Brussels (2 days onsite per week)

Work Regime: Full-time

Role Summary:

The Platform Engineer is responsible for the reliable, secure, and stable operation of a high-availability cloud platform, built on Kubernetes and composed of multiple in-house platform components. The role focuses on platform lifecycle management, day-2 operations, incident response, and operational excellence, ensuring that customer-facing Web UIs and APIs remain available, performant, and secure 24/7. The Platform Engineer acts as a technical custodian of the platform, providing a stable foundation on which service teams can safely deploy and operate their workloads.

Primary Objectives:

  • Maintain platform availability and reliability in accordance with SLOs/SLAs.
  • Ensure operational readiness of all environments (DEV / TEST / ACC / PROD).
  • Provide 24/7 operational coverage for critical platform services (via on-call).
  • Ensure the platform is observable, secure, well-controlled, and documented.
  • Execute platform changes, upgrades, and maintenance in a predictable and low-risk manner.

Key Responsibilities:

Kubernetes & Runtime Operations

  • Operate Kubernetes primitives and platform add-ons: Ingress controllers, service discovery, workload identity.
  • Troubleshoot Kubernetes-related failures: Pod lifecycle issues, networking problems, resource starvation.
  • Controlled rollouts with rollback plans.

Reliability & 24/7 Incident Response

  • Participate in the 24/7 on-call rotation for critical services (incident responder).
  • Lead or contribute to incident triage and mitigation, Root Cause Analysis (RCA), and post-incident action tracking and follow-up.
  • Maintain and improve runbooks and operational procedures.

Observability & Monitoring

  • Operate and use the open-source observability platform.
  • Ensure effective observability across the platform: Metrics, logs, and distributed traces.
  • Actionable alerts and reduced false positives.
  • Support incident analysis through correlation and telemetry inspection.

Change, Release & Maintenance Management

  • Plan and execute platform changes.
  • Follow structured change management practices.
  • Stakeholder communication.
  • Ensure platform changes are documented and auditable.

Security & Compliance (Operational Focus)

  • Operate platform security controls: RBAC, network boundaries, secret management.
  • Apply security updates and patches to platform components.
  • Support vulnerability remediation efforts.
  • Provide operational evidence for audits and security reviews.

Automation & Operational Improvement

  • Automate repetitive operational tasks where appropriate.
  • Reduce operational risk through standardization and documented procedures.
  • Platform as Code approach (GitOps).

Requirements:

Technical Skills

  • Kubernetes (Deep Production Expertise)
  • Multi-cluster architecture & lifecycle management
  • RBAC & least-privilege design
  • Network policies & traffic segmentation
  • Stateful workloads & storage strategy (CSI, PV/PVC)
  • Autoscaling (HPA/VPA) & resource tuning
  • Pod Security Standards
  • Admission controllers
  • Performance & reliability troubleshooting
  • Cluster-level debugging (networking, DNS, scheduling, OOM, crash loops)

GitOps & Continuous Delivery

  • ArgoCD (advanced usage)
  • App-of-Apps pattern
  • Sync waves & hooks
  • Drift detection & reconciliation
  • Multi-environment promotion workflows
  • Git-based deployment strategy with version management
  • Declarative platform design with PR-driven changes
  • YAML-based CI/CD pipelines with Harness.io
  • Secure secret handling in CI/CD (with HashiCorp)

Packaging & Configuration

  • Helm (advanced chart authoring)
  • Reusable library charts
  • OCI-based registries
  • Values layering strategy
  • Kustomize overlays for multi-environment isolation and strategic patches

Container & Artifact Management

  • Docker (secure multi-stage builds, optimization)
  • Harbor (RBAC, replication, vulnerability scanning)
  • JFrog Artifactory (Docker & Helm registry management)
  • Artifact versioning & promotion strategy

Secrets & Security

  • HashiCorp Vault for dynamic secrets with CSI integration
  • Image vulnerability scanning integration
  • Supply chain security awareness
  • TLS & certificate lifecycle management
  • RBAC governance

Observability & Reliability

  • OpenTelemetry (metrics, logs, traces)
  • Prometheus or VictoriaMetrics (recording rules, HA setup)
  • Loki (log aggregation & LogQL)
  • Tempo (distributed tracing)
  • Grafana (advanced dashboards & alerting)
  • SLI/SLO design & error budget thinking
  • Alert noise reduction strategy

Networking (Advanced)

  • TCP/IP & DNS fundamentals
  • TLS & mTLS concepts
  • Kubernetes Services, Ingress & Reverse Proxy concepts
  • East-west vs north-south traffic
  • API routing & traffic management
  • Network Policies implementation

Automation

  • Advanced Bash scripting
  • Infrastructure automation mindset

Nice to Have

  • Kong API Gateway (api routing, plugins, authentication, rate limiting)
  • Redis (operational knowledge: deployment, persistence, clustering, backups)
  • PostgreSQL (migrations, backups, HA basics, Kubernetes deployment patterns)
  • MongoDB (replica sets, backups, Kubernetes deployment patterns)
  • Kargo on top of ArgoCD for release orchestration

Operational Skills

  • Proven experience in production operations or platform support roles
  • Ability to work calmly and methodically under pressure
  • Strong troubleshooting skills across distributed systems
  • Clear written and verbal communication during incidents and changes
  • Flexibility to balance daily operations with long-term changes

Ways of Working

  • Structured, risk-aware, and detail-oriented
  • Comfortable with operational responsibility and accountability
  • Strong collaboration with Development teams, Security teams, Product teams
  • Documentation-first mindset for operational knowledge

Positioning vs Other Roles:

  • Not a pure SRE role: focus is stability and operations, not reliability engineering.
  • Not a pure DevOps engineer embedded in product teams.
  • The role is the operational owner of the platform, in all environments, ensuring they run safely and predictably.

Deel deze vacature

Powered by