Overview
We help the world run better. At SAP, we keep it simple: you bring your best to us, and we\'ll bring out the best in you. We\'re builders touching over 20 industries and 80% of global commerce, and we need your unique talents to help shape what\'s next. The work is challenging – but it matters. You\'ll find a place where you can be yourself, prioritize your wellbeing, and truly belong. What\'s in it for you? Constant learning, skill growth, great benefits, and a team that wants you to grow and succeed.
Summary: You’ll join the SAP Procurement Engineering organization within our Intelligent Spend Management group, contributing to transforming how enterprises manage their procurement processes.
Responsibilities
- Architect and lead an enterprise-grade, AI-powered engineering productivity and DevOps automation platform that combines agentic copilots, conversational workflows, intelligent orchestration, and continuous learning tailored to SAP’s procurement domain.
- Establish intelligent test selection and change-impact analysis at portfolio scale to compress multi-hour pipelines to minutes, increase throughput, and materially reduce cloud spend without sacrificing quality.
- Design and implement Kubernetes operators/CRDs for ephemeral environment provisioning, parallel test execution, and elastic scaling that optimize utilization while ensuring isolation and reliability.
- Build a secure framework for dynamic tool discovery, capability negotiation, and policy-aware routing (MCP-like), enabling AI agents to safely invoke internal systems, APIs, and automations.
- Deliver IDE/terminal assistants and intelligent web interfaces that understand repo/terminal context, retain cross-session memory, and support multi-modal interaction, driving daily adoption and measurable productivity gains.
- Introduce natural-language workflow creation with predictive optimization and cross-organizational learning; integrate with enterprise automation (e.g., n8n/Temporal/GitOps) to convert tribal knowledge into repeatable runbooks.
- Implement proactive detection, alert correlation, automated diagnostics, and auto-remediation using metrics/logs/traces to reduce MTTR and prevent incidents before user impact.
- Build a domain-aware knowledge graph unifying code, tests, services, incidents, and operational signals; enable RAG-based assistance for PR review, security checks, onboarding, and support triage.
- Define governance, safety, security, and explainability patterns for agentic systems on BTP and hyperscalers; ensure compliance, auditability, and human-in-the-loop controls.
- Set platform and developer experience standards for AI-assisted development, CI/CD, and environment management that scale across SAP Procurement Engineering.
- Translate platform capabilities into measurable outcomes—faster feedback cycles, lower MTTR, cost savings, higher release confidence, and accelerated feature delivery.
- Handle enterprise-scale agentic AI, intelligent CI/CD, Kubernetes operators, knowledge graphs, and domain-aware automation.
- Own portfolio-wide engineering productivity and DevEx strategy; provide end-to-end ownership from architecture to measurable business outcomes.
- Uplift organizational capabilities in AI/ML, platform thinking, SRE/DevOps, and modern testing practices across multiple engineering groups.
- Partner with product, SRE, Security, and platform organizations; influence company-wide technical strategy and work directly with senior leadership.
- Define reference architectures and standards for AI-native productivity platforms, Kubernetes-native orchestration, and intelligent testing; contribute patterns back to SAP\'s enterprise architecture.
- Demonstrate experience with AI-enabled platforms, cloud-native systems, CI/CD, and developer experience; 10–15+ years overall with 7–10+ years in platform/architecture leadership.
Role Requirements
- Strategic Technology Leadership
- Polyglot Engineering: Java/JVM (Spring), plus Go and/or Python and TypeScript/Node for platform, tooling, and operator development.
- Technology Strategy: Tracking and adopting emerging technologies (LLMs, multi-agent RL, vector search, graph databases) and translating them into scalable, secure enterprise platforms.
- Enterprise Integration Mastery
- Data Architecture: Graph databases, vector stores; data governance, lineage, and access controls
- APIs and Protocols: REST, GraphQL, gRPC; OAuth2/OIDC; familiarity with OData and enterprise integration patterns
- Eventing and Messaging: Kafka or equivalent streams; real-time processing and CQRS; WebSockets for collaborative experiences
- Platform Architecture
- Cloud-Native at Scale: Kubernetes (EKS/GKE/AKS), service mesh, multi-tenant SaaS, global resiliency and performance
- Kubernetes Operators: CRD design, reconciliation loops, intelligent scheduling, cluster autoscaling, cost-aware resource management
- Developer Platforms: Internal developer platform design (e.g., Backstage), API management, golden paths, secure toolchains on BTP and major clouds
- AI and Modern Practices
- Production LLM/Agent Systems: Retrieval-augmented generation, tool-use/Function Calling, prompt/retrieval engineering, evaluation, safety, observability
- Reinforcement Learning: Multi-agent RL for classification/decision problems and continuous improvement loops
- MLOps/LLMOps: Model hosting/gateways, feature stores, vector indexes, policy enforcement, red-teaming, telemetry for AI quality
- DevOps Excellence: Trunk-based development, GitOps, progressive delivery, automated experimentation, and performance/scalability testing
- CI/CD and Test Intelligence
- Change-Impact Analysis: Static code analysis, dependency graphs, and historical signal mining
- Pipeline Tooling: Jenkins, GitHub Actions, GitLab CI, Tekton/Argo; caching, parallelization, flaky-test mitigation
- Metrics and Outcomes: Demonstrated reductions of multi-hour pipelines to sub-30 minutes with maintained or improved coverage
- Observability, SRE, and Incident Automation
- Full-Stack Telemetry: OpenTelemetry, Prometheus, Grafana, ELK/Splunk; SLOs, error budgets, auto-diagnostics
- Automated Remediation: Alert deduplication/correlation, runbooks-as-code, safe remediation workflows
- Security, Compliance, and Responsible AI
- Data Security: Secrets management, least-privilege design, governance in regulated contexts
- Responsible AI: Bias testing, explainability, audit trails, human-in-the-loop, policy-compliant agent behavior
- Industry Influence: Ability to influence technology adoption across a large organization
Bring out your best. SAP innovations help more than four hundred thousand customers worldwide work together more efficiently and use business insight more effectively.