Ativa os alertas de emprego por e-mail!

Full Stack Engineer

Highbrow Technology Inc

Teletrabalho

BRL 120.000 - 160.000

Tempo integral

Ontem

Torna-te num dos primeiros candidatos

Cria um currículo personalizado em poucos minutos

Consegue uma entrevista e ganha mais. Sabe mais

Resumo da oferta

A cutting-edge technology firm in Brazil seeks a candidate for a coding-focused role with a focus on AI research. In this position, you will write and debug production-quality code, design tasks for evaluations, and evaluate outputs of large language models (LLMs). Collaboration with engineers and researchers is vital, and a strong coding background is required. If you enjoy solving technical problems and have experience with LLM coding tools, we encourage you to apply.

Responsabilidades

Write, review, and debug code across multiple languages.
Design tasks and evaluation scenarios for coding, reasoning, and debugging.
Investigate LLM outputs and identify hallucinations, regressions, and failure modes.
Develop scripts, pipelines, and tools for data generation, scoring, and validation.
Produce structured annotations, judgments, and high-quality datasets.
Run systematic evaluations that help improve model reliability and reasoning.

Conhecimentos

Experience using LLM coding tools

Solid experience with Linux + Bash

Strong with Docker

Advanced Git skills

Solid understanding of testing and QA

Ability to reliably overlap with 8am–12pm PT

Availability - 40 hours per week with 4 hours of overlap with PST.

Role Overview

We’re a coding-focused team at Turing that serves as a research partner for a Frontier AI Lab. Our role is to build coding tasks, evaluations, datasets, and tooling that help train and improve large language models (LLMs).

You’ll write and debug production-quality code, design rigorous evaluations, and build reproducible workflows that generate clean, high-signal data for model training. Attention to detail matters deeply here—small mistakes can cascade into misleading results, so precision and thoroughness are essential. You’ll also collaborate closely with engineers, researchers, and quality owners to align on standards, review work, and continuously raise the quality bar.

If you enjoy solving unusual technical problems, investigating subtle model failures, and working in developer-like environments where correctness, reproducibility, and collaboration matter, this role will keep you very entertained.

What does your day-to-day look like

Write, review, and debug code across multiple languages
Design tasks and evaluation scenarios for coding, reasoning, and debugging
Investigate LLM outputs and identify hallucinations, regressions, and failure modes
Develop scripts, pipelines, and tools for data generation, scoring, and validation
Produce structured annotations, judgments, and high-quality datasets
Run systematic evaluations that help improve model reliability and reasoning

Required Skills

Experience using LLM coding tools (Cursor, Copilot, CodeWhisperer) and strong hands‑on coding experience (professional or research-based) in one or more of:
Solid experience with Linux + Bash, scripting, and automation
Strong with Docker, reproducible environments, and dev containers
Advanced Git skills (branching, diffs, patches, conflict resolution)
Solid understanding of testing and QA (unit, integration, negative, edge‑case focused)
Ability to reliably overlap with 8am–12pm PT

Obtém a tua avaliação gratuita e confidencial do currículo.

ou arrasta um ficheiro em formato PDF, DOC, DOCX, ODT ou PAGES até 5 MB.

Melhores cidades

Melhores empresas

Ofertas populares