Position: Mid-Senior level

Jobtyp: Full-time

Loading ...

Jobinhalt

Role Overview

We are hiring a Linux infrastructure specialist to join a core High-Performance Computing (HPC) team. You will support and evolve simulation and R&D compute platforms used by internal Engineering teams.

This is a technically broad role where you’ll touch on Linux infrastructure, cluster orchestration, automation, monitoring, and some hands-on hardware support. You’ll join a small, senior team with high autonomy and end-to-end responsibility over the HPC platform.

Key Responsibilities

  • Administer and optimize Linux-based HPC clusters (Ubuntu, CentOS, RHEL-family)
  • Manage workload scheduling with Slurm
  • Support containerized workloads using Docker and Singularity
  • Implement and manage infrastructure-as-code via Ansible and Terraform
  • Support GPU-accelerated workloads (NVIDIA, CUDA)
  • Monitor system health and performance using Grafana, Prometheus, and related tools
  • Troubleshoot hardware and perform physical support tasks (rack/stack, diagnostics, cabling)
  • Collaborate with internal researchers and engineers to support and improve workload performance
  • Contribute to documentation and help mature internal platform standards and practices
Requirements

  • Operating Systems: Ubuntu, CentOS, RHEL derivatives (Rocky, Alma)
  • Schedulers: Slurm (primary), OpenOnDemand (optional)
  • Containers: Docker, Singularity
  • Automation: Ansible, Terraform, Bash, Python
  • Monitoring: Grafana, Prometheus, custom metrics
  • HPC Filesystems: Lustre (required), GPFS, Ceph (optional)
  • Hardware: Server maintenance, rack/stack, troubleshooting
  • Collaboration: Git, Jira, CI/CD pipelines
Ideal Candidate Profile

  • 5+ years of Linux system administration experience, including in performance-sensitive environments
  • Experience supporting or operating HPC clusters (Slurm, Lustre)
  • Scripting ability in Bash and Python
  • Hands-on automation experience with Ansible and Terraform (or equivalents)
  • Familiarity with containerization and job isolation (Docker/Singularity)
  • Comfortable with infrastructure observability tools and performance tuning
  • Proactive, autonomous, and able to collaborate across teams and functions
  • Fluent in English (spoken and written)
Nice to Have

  • Experience with Bright Cluster Manager or other cluster deployment tools
  • Exposure to distributed file systems (e.g., Ceph)
  • Familiarity with OpenOnDemand or other HPC frontend tools
  • Understanding of GPU scheduling (CUDA/NVIDIA)
  • Cloud exposure (AWS, Azure, or GCP)
Benefits

While preferably we are looking for a Full-Time Employee (FTE), exceptions can be made for the right candidate if they would rather work as a freelancer (Contractor).

Here is a list of benefits:

  • Meal vouchers
  • Pension scheme (2%)
  • Hospitalization Insurance
  • Remote work allowance (60€/month)
Loading ...
Loading ...

Frist: 26-12-2025

Klicken Sie hier, um sich für einen kostenlosen Kandidaten zu bewerben

Anwenden

Loading ...

ÄHNLICHE ARBEITEN

Loading ...
Loading ...