Senior Data Scientist- AI Evaluation

Elsevier

5 months ago

Full-time

On-site

Unknown

Do you have hands-on experience designing reliable evaluations for LLM/NLP features? Do you enjoy turning messy product questions into clear study designs, metrics, and production-ready code? About our Team Elsevier’s AI Evaluation team designs, builds, and operates NLP/LLM evaluation solutions used across multiple product lines. We partner with Product, Technology, Domain SMEs, and Governance to ensure our AI features are safe, effective, and continuously improving. About the Role As a Senior Data Scientist III, you will design and implement end-to-end evaluation studies and pipelines for AI products. You’ll translate product requirements into statistically sound test designs and metrics, build reproducible Python/SQL pipelines, run analyses and QC, and deliver concise readouts that drive roadmap decisions and risk mitigation. You’ll collaborate closely with SMEs, contribute to our shared evaluation libraries, and produce audit-ready documentation aligned with Responsible AI and governance expectations. Responsibilities · Study design & metrics — Translate product questions into hypotheses, tasks/rubrics, datasets, and success criteria; define metrics (accuracy/correctness, groundedness, reliability, safety/bias/toxicity) with acceptance thresholds. · Pipelines & tooling — Build and maintain Python/SQL evaluation pipelines (data prep, prompt/rubric generation, LLM-as-judge with guardrails, scoring, QC, reporting); contribute to shared packages and CI. · Statistical rigor — Plan for power, confidence intervals, inter-rater reliability (e.g., Cohen’s κ/ICC), calibration, and significance testing; document assumptions and limitations. · SME integration — Partner with SME Ops and domain leads to create clear rater guidance, run calibration, monitor IRR, and incorporate feedback loops. · Analytics & reporting — Create analyses that highlight regressions, safety risks, and improvement opportunities; deliver crisp write-ups and executive-level summaries. · Governance & compliance — Produce audit-ready artifacts (evaluation plans, datasheets/model cards, risk logs); follow privacy/security guardrails and Responsible AI practices. · Quality & reliability — Implement test hygiene (dataset/versioning, golden sets, seed control), observability, and failure analysis; help run post-release regression monitoring. · Collaboration — Work closely with Product and Engineering to scope, estimate, and land evaluation work; participate in code reviews and design sessions alongside fellow Data Scientists. Requirements · Education/Experience: Master’s + 3 years, or Bachelor’s + 5 years, in CS, Data Science, Statistics, Computational Linguistics, or related field; strong track record shipping evaluation or ML analytics work. · Technical: Strong Python and SQL; experience with LLM/NLP evaluation, data/versioning, testing/CI, and cloud-based workflows; familiarity with prompt/rubric design and LLM-as-judge patterns. · Statistics: Comfortable with power analysis, CIs, hypothesis testing, inter-rater reliability, and error/slice analysis. · Practices: Git, code reviews, reproducibility, documentation; ability to turn ambiguous product needs into executable study plans. · Communication: Clear written/oral communication; ability to produce crisp dashboards and decision-ready summaries for non-technical stakeholders. · Mindset: Ownership, curiosity, bias-for-action, and collaborative ways of working. Nice to have · Experience with evaluation of retrieval-augmented or agentic systems and/or with safety/bias/toxicity measurements. · Familiarity with lightweight orchestration (e.g., Airflow/Prefect) and containerization basics. · Exposure to healthcare or education content or working with clinician/academic SMEs. We are committed to providing a fair and accessible hiring process. If you have a disability or other need that requires accommodation or adjustment, please let us know by completing our Applicant Request Support Form or please contact 1-855-833-5120. Criminals may pose as recruiters asking for money or personal information. We never request money or banking details from job applicants. Learn more about spotting and avoiding scams here. Please read our Candidate Privacy Policy. We are an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law. USA Job Seekers: EEO Know Your Rights. Elsevier is a global leader in advanced information and decision support for science and healthcare. We believe that by working together with the communities we serve, we can shape human progress to go further, happen faster, and benefit all. We support continuous discovery and uphold the highest standards of content integrity, reliability, and reproducibility so the communities we serve can advance their field of science, healthcare or innovation with confidence. By combining high-quality content with powerful analytics, we transform complexity into clarity and deliver mission-critical insights that help professionals make better decisions when it matters most. We deliver insights that help research institutions, governments, and funders achieve their goals. We help researchers discover and share knowledge, collaborate, and accelerate innovation. We help librarians provide verified, quality information to universities. We help innovators turn knowledge into new products. We help health professionals improve patient care and educators train the next generation of doctors and nurses. Connecting quality content and innovative technologies, we make progress go further and happen faster. And by championing inclusion and sustainability, we ensure progress benefits all. With 9,500 employees, over 2,300 technologists in 5 major tech hubs, and more than 60 locations across the globe, we are committed to supporting the scientific and healthcare communities around the world. We offer a diverse range of opportunities across technology, commercial, business, and early career jobs. If you are looking for a career that inspires progress in science, innovation and health, and allows you to grow every day, find your team at Elsevier. Elsevier is part of RELX Group. Let’s shape progress together. Join us. elsevier.com/about/careers

Apply now

Senior Data Scientist- AI Evaluation

More jobs

Program Manager – Segment Marketing

Elsevier

Project Coordinator (Journal Manager)

Elsevier