Job Description
Job Description
Work across multiple different projects to improve LLM performance on code : sample projects
- Leading and delivering end-to-end agent use cases such as home automation agents, coding copilots, or creative design assistants.
- Collaborate with the team to identify edge cases and ambiguities in model behavior.
- Review and compare 3–4 model-generated code responses per task using a structured ranking system.
- Evaluate code diffs for correctness, code quality, style, and efficiency. Provide clear, detailed rationales explaining the reasoning behind each ranking decision.
Required Skills Experience
Several years of software engineering experience, including 2+ continuous years at a top-tier product company (e.g., Google, Stripe, Amazon, Apple, Meta, Netflix, Microsoft, Datadog, Dropbox, Shopify, PayPal, IBM Research).Strong expertise in building full-stack applications and deploying scalable, production-grade software using modern languages and tools.Deep understanding of software architecture, design, development, debugging, and code quality / review assessment.Proven ability to review code diffs and evaluate correctness, maintainability, and efficiency.Excellent oral and written communication skills for clear, structured evaluation rationales.Engagement Details
Commitment : flexible engagement, minimum 10 hrs / week, up to 40 hrs / week (partial PST overlap required).Type : Contractor (no medical / paid leave).Duration : 1 month potential extensions based on performance and fit.