Social network you want to login/join with:
Software Engineer, Memory & Observability (Mid-Level), london
col-narrow-left
Client:
Open Code Mission
Location:
london, United Kingdom
Job Category:
Other
-
EU work permit required:
Yes
col-narrow-right
Job Views:
4
Posted:
31.05.2025
Expiry Date:
15.07.2025
col-wide
Job Description:
Why Open Code Mission?
Open Code Mission buildsETERNALLY, a learning-augmented memory architecture that couples a durable JSON + FAISS Memory Core with surprise-aware Neural Memory and a Context Cascade Engine to let agents learn at test time. Our B2B dashboard exposes explainable diagnostics so security and product teams can trust what their AI is doing.
We’re a small, execution-driven team; you’ll ship code that lands in production within days, not quarters.
The Impact You’ll HaveIn your first 6–12 months you will:
- Harden concurrency pathsinside the Memory Core—e.g., finishingassembly_transactionlocking and vector-index repair loops—so we can scale from single-tenant pilots to multi-capsule production clusters.
- Instrument end-to-end metrics(Prometheus + custom JSONL traces) across MC → NM → CCE so variant decisions and QuickRecal boosts surface in the dashboard with < 2 s latency.
- Extend ourReact/Express dashboardwith new health, explainability, and live-log views, wiring them to the triple-nested API contract.
- Addtest-time-learning features(e.g., MAG gate experiments) behind feature flags and run A/B evaluations with the research team.
What You’ll Do Day-to-Day- Design and implement Python micro-services (FastAPI / asyncio) that talk to FAISS, Redis, and TensorFlow.
- Write clear, observable code—structured logging, Prometheus counters, Grafana alerts.
- Optimize async pipelines, back-pressure, and retry queues; profile and fix race conditions.
- Ship TypeScript/React features (tables, charts, WebSocket log streams) that consume ourselectData()hooks.
- Review PRs with empathy; propose small RFCs for larger refactors.
Must-Haves- 3-6 yearsprofessional software experience; comfortable owning production services.
- SolidPython 3.10+: asyncio, typing, FastAPI (or Flask/Fastify-equivalent for JS).
- Working knowledge ofmachine-learning inference flows: embeddings, vector search, or LLM APIs.
- Concurrency literacy: async/await, task pools, locks; can explain when to pick threads vs processes vs async.
- Observability & scale: you’ve plumbed Prometheus/Grafana (or OpenTelemetry) into high-QPS APIs and know what RED/USE means.
- API routing & gateway patterns(reverse proxies, rate limiting, shrink-wrap error envelopes).
- Comfortable in *nix, Docker(-Compose); can add a health check and iterate locally.
Nice-to-Haves- TensorFlow 2.x or PyTorch; have traced a gradient or two.
- FAISS, Milvus or other ANN libraries.
- Experience withReact + TanStack Query + Zustandor similar state stacks.
- Basic familiarity with Kubernetes and GitHub Actions CI.
- Interest in Explainable-AI, AI and traditional cyber security, and LLM governance.
Working Style- Remote-first(core hours 10:00-17:00 UTC).
- Weekly engineering demo; lightweight RFC process; “you build it, you own it” on-call rota (one week every ~6).
- Small, friendly code reviews focused on clarity and test coverage, not nit-picking variable names.
Compensation & Growth- Salary band£70,000 – £95,000 + meaningful equity(DOE & location).
- Annual learning budget (£1,000).
- GPU credits for side experiments.
- Clear growth track to Senior Engineer: own a capsule-scale roll-out, mentor junior devs, and architect a new service.
Hiring Process (≈ 4 weeks)- For pre-qualified and vetted applicants there will be a 90 minute informal chatto assess culture & role fit.
- Technical discussionwith a walk-through async/metrics design you’re proud of; no LeetCode).
- Take-home task(build or instrument a tiny async API; ~3 hours, paid).
- Offer & reference call.
Ready to build memory systems that can actuallylearnin production? ThenApply and include your GitHub or a project you’re proud of.