ETH Zürich

DevOps Engineer

📍 Lugano

Role and responsibilities

Design, deploy, automate, and operate scalable infrastructure and cloud-native platform services. Contribute to Kubernetes-based AI/ML and HPC platforms, including CI/CD, GitOps, observability, security, and operational tooling. Collaborate with researchers and engineers to support complex workflows, troubleshoot production environments, and improve reliability and performance. Contribute to platform engineering, automation, and developer productivity initiatives across evolving systems and services.

Team / description

The Swiss National Supercomputing Centre (CSCS) develops and operates a high-performance computing and data research infrastructure that supports world-class science in Switzerland. Its user laboratory is available to domestic and international researchers in academia, industry, and the business sector. The centre is operated by ETH Zurich and has offices at its data centre in Lugano and in Zurich.

Qualifications and Skills

  • Linux systems engineering, and software development (e.g., Python, Bash)

  • Containers, Kubernetes, CI/CD, GitOps, and Infrastructure as Code (e.g., Terraform, Helm, Ansible, ArgoCD)

  • Distributed systems concepts, APIs, scalability, observability, identity and access management, and security

  • AI/ML platforms and supporting infrastructure services

  • HPC systems, GPU clusters, and large-scale infrastructure environments

  • Platform engineering and developer productivity tooling

  • Secure or confidentiality-sensitive operational environments

  • Curious, hands-on, and eager to understand systems inside-out

  • Strong engineering mindset and problem-solving attitude

  • Comfortable learning new technologies and working across disciplines

  • Effective communicator and collaborative team player

  • Experience supporting research or scientific computing environments

  • Familiarity with HPC systems and services

  • Exposure to GPU clusters and accelerated computing

  • Experience with SRE practices or on-call operations

  • Advanced Linux security knowledge

  • Ability to leverage AI tools for increased productivity