AI Platform Engineer Department: AI Employment Type: Full Time Location: Warwick Description About the Role We're building an ambitious internal AI Platform to power Bright's next generation of AI-driven products and services. This Kubernetes-hosted platform provides teams across the organisation with the tools to build, deploy, and observe AI-powered applications without managing complex infrastructure themselves. As an AI Platform Engineer, you'll join a small, high-impact team building critical platform infrastructure for LLM operations (LLMOps). Working under the supervision of two senior/principal platform engineers and reporting to the Head of AI, you'll be instrumental in delivering self-service AI capabilities that enable developers across Bright to build sophisticated AI applications with confidence. This is an opportunity to work on cutting-edge AI infrastructure, learn from experienced platform engineers, and make a significant impact on how Bright leverages AI technology at scale. Key Responsibilities Our roadmap spans multiple interconnected platform epics. You'll contribute to key initiatives including: Core Platform Services Observability & Experimentation: Enhancing Langfuse for LLM tracing, evaluation, and experimentation capabilities Developer Self-Service: Building and improving Backstage as an internal developer portal for platform discoverability LLM Operations: Deploying and maintaining LiteLLM proxy, Langflow runtime, and other core LLM services Monitoring & Logging: Implementing platform-wide monitoring (Prometheus/Grafana) and logging infrastructure (Loki) Security & Compliance LLM Ops Security: Implementing guardrails (LlamaGuard, Azure Guardrails) and security controls GDPR & PII Management: Building automated PII detection, minimization strategies, and compliance tooling Incident Response: Establishing security incident response procedures for LLM operations Infrastructure & Reliability Kubernetes Operations: Managing AKS clusters, implementing reliable deployment tooling via ArgoCD Infrastructure as Code: Productionizing infrastructure with Terraform, eliminating manual configuration Autoscaling & Performance: Implementing workload management and autoscaling for AI services Storage Solutions: Migrating from self-hosted MinIO to managed Azure Blob Storage Applications Support You'll also support the deployment and operation of AI applications built on the platform, including: RAG (Retrieval-Augmented Generation) applications like Ask IPASS and Ask UK Pay Centre Document processing applications (BrightCapture) Employee onboarding automation (Oscar) Internal AI assistant (Bright GPT) Skills, Knowledge and Expertise What We're Looking For Essential Skills & Experience Platform Engineering Fundamentals: 2-4 years experience with cloud infrastructure, preferably Azure Kubernetes: Practical experience deploying and managing applications in Kubernetes (AKS experience is a plus) Infrastructure as Code: Hands-on experience with Terraform or similar IaC tools CI/CD: Experience with GitOps workflows and tools like ArgoCD, GitHub Actions, or similar System Programming: Proficiency in Python or Go for automation and tooling; shell scripting essential Linux & Containers: Solid understanding of containerization with Docker and container orchestration Desirable Experience Exposure to LLM technologies or AI/ML infrastructure Experience with observability tools (Prometheus, Grafana, Loki) Knowledge of Helm and Helmfile for Kubernetes deployments Knowledge of Kustomize Understanding of security best practices and compliance requirements (GDPR) Backend-as-a-Service platforms (Supabase or similar) Developer portal platforms (Backstage or similar) Application programming experience with .NET and/or TypeScript What Makes You a Great Fit Learning Mindset: You're excited to learn about LLM operations and emerging AI infrastructure patterns Systems Thinking: You understand how distributed systems work and can reason about failure modes Pragmatic Approach: You balance perfect solutions with shipping value quickly Collaboration: You work well with both technical and product stakeholders Documentation: You believe good documentation is as important as good code Ownership: You take responsibility for your work from development through to production Team Structure & Reporting Reports to: Head of AI Works closely with: Two senior/principal platform engineers Collaborates with: Application development teams, product managers, and security/compliance stakeholders Team size: Small, full-stack AI team covering development, DevOps, operations, and support What Success Looks Like In your first 3 months: You've contributed to multiple platform epics from our roadmap You understand the architecture of our AI platform and can navigate the codebase You've successfully deployed services to our Kubernetes clusters You're participating in on-call rotation and can troubleshoot platform issues In your first 6 months: You're independently owning epics and driving them to completion You're contributing to architectural decisions and technical direction You've improved platform reliability, observability, or developer experience You're mentoring junior engineers or helping onboard new team members Technical Stack Infrastructure: Azure (AKS, Blob Storage, Cognitive Services), Kubernetes, Terraform Platform Services: LiteLLM, Langflow, Langfuse, Supabase, Open Web UI, Backstage Observability: Prometheus, Grafana, Loki, Langfuse tracing CI/CD: ArgoCD, GitHub Actions, Helmfile Languages: Python, Go, Shell scripting Security: Azure Guardrails, LlamaGuard, PII detection tooling Why Join Bright's AI Platform Team? Impact: Your work directly enables AI innovation across the entire organization Growth: Learn from experienced platform engineers in a supportive environment Cutting Edge: Work with the latest AI infrastructure and tooling Autonomy: Small team means you'll have significant ownership and influence Mission: Help accountants and finance professionals work more efficiently with AI Benefits What will you get? Competitive salary Performance based bonus 25 days annual leave Health Insurance Company pension Company events free food onsite On-site parking Referral programme Sick pay Wellness programmes