
LLM Data Engineer
Interested in this role?
Read all the details below first
Table of Contents
Job Description
LLM Data Engineer
Sanctuary Computer is seeking a skilled and passionate LLM Data Engineer to join our growing team.
Responsibilities:
- Design, build, and maintain robust data pipelines for large language models (LLMs).
- Extract, transform, and load data from diverse sources, ensuring data quality and consistency.
- Develop and implement data validation processes to maintain data integrity.
- Collaborate with engineers and product teams to integrate LLM data with core application systems.
- Troubleshoot data pipeline issues and implement solutions to address format drift.
Qualifications:
- 5+ years of experience in backend development using languages like Python, Ruby on Rails, Elixir Phoenix, Python Django, or Node Express.
- Strong proficiency in Python and experience with data/workflow orchestration tools (e.g., Prefect, Dagster, Airflow).
- Deep understanding of ETL processes and data transformation techniques specifically for LLMs (OpenAI, Claude, etc.).
- Familiarity with LLM APIs, prompt engineering, and structured output generation.
- Excellent problem-solving and analytical skills with a strong attention to detail.
Bonus Points:
- Experience with Google Cloud Platform (GCP), search technologies, and PySpark.
- Product management or engineering management experience.
- Client-facing experience and strong communication skills.
Qualifications
Required:
- Programming Proficiency: Demonstrated expertise in Python, including experience with data manipulation libraries and frameworks.
- Data Engineering Experience: Minimum 5 years of experience in backend development (Ruby on Rails, Elixir Phoenix, Python Django, or Node Express) and/or native app development.
- Data Orchestration: Hands-on experience with data/workflow orchestration tools such as Prefect, Dagster, or Airflow.
- LLM Familiarity: Thorough understanding of Large Language Models (LLMs) like OpenAI and Claude, including API interaction and prompt engineering.
- ETL Expertise: Proven ability to extract, transform, and load data for industry-standard LLMs, addressing format drift and ensuring data quality.
Preferred:
- Experience with Google Cloud Platform (GCP), search technologies, and PySpark.
- Product management and/or engineering management experience.
- Client-facing experience and strong communication skills.
Data Management
- Design, build, and maintain robust data pipelines to ingest, process, and store LLM training data.
- Implement data quality checks and validation procedures to ensure data accuracy and consistency.
- Monitor data pipeline performance, identify bottlenecks, and optimize for efficiency.
Data Engineering
- Develop and maintain scripts and tools for data extraction, transformation, and loading (ETL) processes.
- Utilize data modeling techniques to structure and organize LLM training data effectively.
- Collaborate with data scientists and engineers to understand data requirements and ensure data integrity.
Model Training Support
- Prepare and format data for LLM training, including text cleaning, tokenization, and feature engineering.
- Assist in the evaluation and monitoring of LLM performance, analyzing training data and model outputs.
- Contribute to the development and documentation of best practices for LLM data engineering.
Selection Process
Candidates interested in the LLM Data Engineer position at Sanctuary Computer will first undergo an initial screening call with a member of the team. This call will focus on understanding the candidate's background, experience, and motivations for applying.
Successful candidates will then be invited to complete a technical exercise designed to evaluate their proficiency in Python, data orchestration tools, and their understanding of ETL processes for LLMs. This exercise will provide a practical assessment of the candidate's skills and problem-solving abilities.
Following the technical exercise, shortlisted candidates will participate in a more in-depth interview with the team. This interview will delve deeper into the candidate's experience, technical expertise, and cultural fit within Sanctuary Computer.
The final stage of the selection process will involve a reference check to verify the candidate's previous work experience and professional qualifications.
How to Apply
To apply for a job, read through all information provided on the job listing page carefully.
Look for the apply link on the job listing page, usually located somewhere on the page.
Clicking on the apply link will take you to the company's application portal.
Enter your personal details and any other information requested by the company in the application portal.
Pay close attention to the instructions provided and fill out all necessary fields accurately and completely.
Double-check all the information provided before submitting the application.
Ensure that your contact information is correct and up-to-date, and accurately reflect your qualifications and experience.
Important Note
Submitting an application with incorrect or incomplete information could harm your chances of being selected for an interview.
About Sanctuary Computer
Sanctuary Computer is a leading provider of cutting-edge AI solutions. Driven by a passion for innovation, Sanctuary Computer develops and implements advanced machine learning models across diverse industries. With a team of highly skilled engineers and data scientists, Sanctuary Computer is dedicated to pushing the boundaries of artificial intelligence, delivering impactful results for its clients. The company fosters a collaborative and intellectually stimulating environment, encouraging continuous learning and growth. Sanctuary Computer is committed to ethical AI development, ensuring responsible and transparent applications of its technology.
Ready to Apply?
Join Sanctuary Computer and take your career to the next level. We're looking for talented individuals like you!
Apply for this Job