Enable job alerts via email!

Senior Back-End “Extraction” Engineer (for data extraction from complex legal and finance documents)

Nammu21

New York (NY)

Remote

USD 80,000 - 120,000

Full time

30+ days ago

Boost your interview chances

Create a job specific, tailored resume for higher success rate.

Job summary

An innovative firm is seeking a talented developer to join their dynamic team in New York City. This role focuses on leveraging advanced NLP techniques to extract data from complex finance documents, utilizing a proprietary extraction framework. Collaborating with the CEO, CTO, and analytics team, you will develop algorithms and tools that enhance data extraction efficiency. The position offers the flexibility of remote work during the pandemic, with plans to return to the office in NYC. Join this exciting opportunity to make a significant impact in a fast-paced environment while enjoying competitive benefits and the potential for equity allocation.

Benefits

Paid Vacation
Sick Time
Medical Insurance
Dental Insurance
Vision Insurance
FSA Options
HSA Options
Commuter Benefits
Access to Discount Programs

Qualifications

  • 4+ years experience in a start-up or financial institution.
  • Strong Python and NLP experience with a Bachelor's degree.

Responsibilities

  • Develop proprietary NLP algorithms and tools for data extraction.
  • Perform backend work including REST APIs and database queries.

Skills

Python
NLP
Regular Expressions
ETL Pipelines
Data Warehouse Understanding
Project Management Tools
Agile Methodologies
Clear Communication Skills
Organizational Skills
Creativity and Ingenuity

Education

Bachelor's Degree in a Related Field

Tools

Postgres
Git
Regex
GraphQL
PDFlib / Tet
Lex / YACC

Job description

Role Description

You will be collaborating with our CEO, CTO, other developers and our analytics team to develop and implement the vision of the overall platform. The role will primarily focus on extracting structured and unstructured data from complex finance documents using a proprietary extraction framework utilizing proprietary NLP techniques.

In addition, the role also includes the following responsibilities:

  • Development of proprietary NLP algorithms utilizing regex, spacy, tries and LLMs
  • Development of additional tools and interfaces to create further efficiencies and precision in our data extraction methodology
  • Traditional backend work involving anything from REST APIs to database queries

Skills, Qualifications and Experience

The ideal candidate will have at least 4 years’ experience in a start-up environment or a financial institution with strong computer science fundamentals and a minimum of a Bachelor’s degree in a related field. In particular, the candidate will have:

  • Strong Python experience
  • Experience with regular expressions and NLP
  • Experience with parsing PDF and DOCX files
  • Experience building, testing and maintaining ETL pipelines
  • Data warehouse understanding
  • Familiarity with Postgres or other relational databases
  • Familiarity with Git or other version control systems
  • Experience working with project management tools and Agile
  • Ingenuity, creativity, drive and determination
  • Clear communication skills
  • Strong organizational skills, including the ability to respond quickly in a fast-paced environment
  • Preferable but not required: experience with GraphQL, PDFlib / Tet, or Lex / YACC

The role is based in New York City. Due to the COVID-19 pandemic, all roles are currently remote. We anticipate returning to an office space in NYC at a date TBD. We are seeking candidates who will be able to work in-office in NYC once we resume in-person operations. Potential for equity allocation.

We Offer:

Paid vacation + sick time
Medical, dental, and vision insurance
FSA + HSA options
Additional perks such as commuter benefits and access to discount programs

Get your free, confidential resume review.
or drag and drop a PDF, DOC, DOCX, ODT, or PAGES file up to 5MB.