Job Description
IT Manager | Principal Infrastructure Engineer
Early-Stage AI Compute Company | Wafer-Scale Infrastructure
We’re hiring a IT Manager / Principal Infrastructure Engineer to join an early-stage company redefining how large-scale AI workloads are deployed, optimized, and scaled.
Founded by engineers behind Tesla Dojo, AMD Zen cores, and Apple Silicon, the company is building a full-stack compute platform designed to push beyond conventional data center limits.
By combining wafer-scale integration, 3D heterogeneous architectures, and a tightly optimized software stack, the company is reducing power consumption and communication overhead across AI datacenter environments.
This is a rare opportunity to own the infrastructure backbone behind next-generation AI systems — spanning on-prem, cloud, compute clusters, networking, security, observability, and automation.
You’ll work closely with hardware, systems, and software teams to build reliable, secure, and scalable infrastructure for training and inference workloads at scale.
Tech Stack: Linux, Kubernetes, Slurm, LSF, AWS, GCP, Azure, Ansible, Jamf, Prometheus, Grafana, ELK, Datadog
What We’re Looking For:
- Extensive experience in IT infrastructure, DevOps, systems engineering, or related roles
- Strong Linux systems experience in on-prem compute environments
- Hands-on experience with hybrid or multi-cloud infrastructure
- Production experience managing Kubernetes, Slurm, LSF, or similar clusters
- Strong automation experience with Ansible, Terraform, Python, Bash, or similar
- Familiarity with observability, networking, IAM, and security fundamentals
Bonus: AI/ML, HPC, semiconductor, SOC2, ISO 27001, FedRAMP, Jamf, Intune, or startup experience
Competitive compensation + early-stage equity
