LLM Infrastructure & Deployment Summer 2026 Internship¶

Location: Remote (Global)
Time Commitment: ~20 hours/week, 3-4 months
Type: Unpaid Internship

Position Overview¶

We are seeking an LLM Infrastructure Intern with a DevOps mindset to build, deploy, and maintain the production systems that house our large-scale neural models. While our researchers focus on the "what," you will focus on the "how"—ensuring our LLM pipelines are scalable, observable, and cost-effective.

Key Responsibilities¶

Deploy LLMs and embedding models using containerization (Docker/Kubernetes) and orchestration tools (BentoML, Ray Serve, vLLM)
Build and automate CI/CD pipelines for model updates
Administer and optimize production vector databases (Pinecone, Milvus, Qdrant) for high-performance retrieval in RAG systems
Implement logging and monitoring for LLM performance (Tokens Per Second, latency, model drift)
Use Infrastructure as Code (Terraform/Ansible) to manage cloud-based GPU clusters
Develop and maintain FastAPI/Flask wrappers around model endpoints

Qualifications¶

Strong Python backend development experience with async programming and API design
Proficiency with Docker and Git; Kubernetes or cloud providers (AWS/GCP/Azure) is a major plus
Familiarity with inference optimization (quantization, caching, load balancing)
Experience managing relational and NoSQL/Vector databases
Ability to troubleshoot complex distributed systems
Currently pursuing a degree in Computer Science, Software Engineering, or related technical field

Apply Now¶

Apply through Google Form

Your application may not be seen if you apply via LinkedIn or Idealist. Please use the Google Form above.