I'm partnering with a specialist UK technology consultancy to support the hire of an experienced Contract Data Scientist/ML Engineer for a major Generative AI project within a secure public-sector environment.
This is an opportunity to work on high-impact AI initiatives, helping to redesign complex human-driven processes through LLMs and advanced retrieval systems. The work is fully remote with periodic UK travel, and active SC clearance is essential.
This role is inside iR35 and fully remote.
Key Responsibilities
Analyse, structure, and transform complex, messy datasets into machine-readable formats suitable for LLMs.
Design and optimise RAG datasets, embeddings pipelines, and retrieval strategies.
Implement and evaluate embeddings-based search using vector databases.
Conduct robust EDA, data quality assessment, and anomaly detection.
Translate manual human processes into clear, machine-interpretable logic for GenAI integration.
Deliver modular, production-ready Python code with minimal oversight.
Evaluate LLM and RAG system performance using modern metrics and techniques.
Technical Skills
Strong Python engineering skills (exploratory + production-ready).
Comprehensive EDA and data analysis capability.
Expertise in LLM data preparation, including:
Prompt engineering fundamentals.
Embeddings & vector databases (FAISS, Weaviate, Chroma).
RAG dataset design & retrieval optimisation (chunking strategies, hybrid search, re-ranking).
Evaluation techniques for RAG (retrieval scoring, LLM-as-a-judge, hallucination checks).
Ability to convert unstructured, ambiguous data into structured, validated datasets.
Strong understanding of data quality, validation, and documenting assumptions.
Clear communication of technical findings to both technical and non-technical audiences.
Familiarity with AWS is beneficial.