Kyndryl Switzerland GmbH

Senior MLOps / AI Ops Engineer

📍 8005 Zürich

Role and responsibilities

Design, implement, and optimize MLOps/AI Ops architectures supporting scalable AI/ML solutions. Build and manage cloud-native infrastructure using Azure, Kubernetes, Docker, Terraform, and CI/CD pipelines. Lead end-to-end AI/ML lifecycle operations, including data ingestion, model training, versioning, deployment, and monitoring. Implement and manage agent/model registries, automated deployment workflows, and model monitoring systems. Drive AI Ops capabilities, including prompt lifecycle management, RAG pipeline optimization, semantic search enhancements, vector database integration, and LLM observability. Establish robust monitoring, alerting, and performance tuning practices for large-scale production systems. Collaborate with data engineering teams to ensure efficient data pipelines and infrastructure alignment. Ensure high standards of security, governance, and reliability across ML/AI systems. Implement security best practices such as IAM, secrets management, policy-as-code, drift detection, and audit trails. Optimize production systems for autoscaling, resiliency, and cost efficiency. Evaluate emerging technologies and recommend enhancements to the AI/ML platform.

Team / description

At Kyndryl, we run and reimagine the mission-critical technology systems that drive advantage for the world’s leading businesses. We are at the heart of progress; with proven expertise and a continuous flow of AI-powered insight, enabling smarter decisions, faster innovation, and a lasting competitive edge. For our people—Kyndryls—that means doing purposeful work that powers human progress. Join us and experience a flexible, supportive environment where your well-being is prioritized and your potential can thrive.

Qualifications and Skills

  • 12+ years of overall experience, with strong expertise in MLOps, AI Ops, and cloud infrastructure.

  • Hands-on experience with Kubernetes, Docker, Terraform, Azure cloud, and CI/CD pipelines.

  • Strong understanding of the full AI/ML lifecycle, including data engineering, model development, deployment, monitoring, and retraining.

  • Experience with LLM lifecycle, prompt management, RAG frameworks, vector databases, semantic caching, and observability tools.

  • Solid understanding of infrastructure-as-code, automation, and cloud-native architectures.

  • Knowledge of IAM, secrets management, security policies, audit logging, and compliance frameworks.

  • Proven ability in performance tuning, autoscaling, and managing large-scale distributed systems.

  • Excellent communication, collaboration, and leadership skills.