Provide advanced operational support for Linux-based IT infrastructure, focusing on system monitoring, observability, and reliability.
Ensure platform stability, performance, and availability through proactive monitoring, alerting, and continuous analysis of system health.
Resolve incidents and service requests promptly, using monitoring data and diagnostics to identify and address OS-level issues.
Perform routine maintenance and implement system changes in line with established processes and standards.
Design, implement, and continuously improve monitoring solutions, including metrics collection, logging, alerting, and dashboards.
Conduct root cause analysis based on monitoring data, logs, and system behavior, proposing and implementing long-term corrective actions.
Collaborate with cross-functional teams to ensure systems are secure, compliant, and effectively monitored end-to-end.
Participate in R&D activities, evaluating new technologies and observability tooling, and contributing to solution prototypes.
Drive automation through scripting and Infrastructure-as-Code to enhance monitoring coverage, reduce manual intervention, and increase platform reliability.
Requirements:
Fluent in English and French or Dutch.
Minimum 4 years of experience in IT operations, managing containerized, virtualized, and/or physical Linux-based infrastructure in large-scale environments.
Proven experience in the design, implementation, and operation of monitoring and alerting solutions.
Hands-on experience with PRTG is strongly preferred.
Solid understanding of metrics collection, log aggregation, and alerting strategies.
Experience with Red Hat Enterprise Linux environments and enterprise tooling.
Strong experience with Ansible Automation Platform and familiarity with Infrastructure-as-Code principles.
Practical experience with Git-based workflows and understanding of CI/CD pipelines.
Experience with Red Hat container ecosystem and container monitoring.
Experience working within structured ITIL-aligned processes.
Exposure to hardware lifecycle management and VMware-based virtual environments.
Experience with endpoint detection and protection tools.
Basic operational knowledge of network/security platforms.
Solid scripting skills in Bash and/or Python.
Experience with vendor coordination, support processes, and licensing management.
Other:
Must be a team player with organizational skills.
Customer-minded, accountable, and flexible with working hours.
Must have a driver's license, car, and mobile phone.
Willingness to participate in 24/7 standby/on-call duties.