Alberto Purpura, Ph.D. — Applied AI Researcher and Team Lead at Capital One

Applied AI · NLP · Research

Alberto Purpura.

Applied AI· Capital One

I lead a Data Science and AI team in the Card Intelligence group at Capital One. Over my career, I have worked on different projects in the clinical NLP, generative AI, information extraction, and retrieval spaces. I hold a Ph.D. in Deep Learning for Information Retrieval, with publications in venues such as SIGIR, ACL, NAACL, EMNLP, ECIR, and AMIA. Beyond day-to-day team work, I enjoy building side projects that let me experiment with new technologies in a fun and low-stakes setting.

Previously
  • Apple
  • IBM Research
  • Tempus AI

Recent publications

All papers →
2026 · Tempus AI
Systems and Methods of Using Multiple Modalities of Data with Machine-Learning Models

A method for combining multiple data modalities — such as clinical notes and molecular data — within machine-learning models to support precision-medicine decisions.

US Patent App. US 2026/0120868 A1 · WO 2026/089745 A1
EACL 2026
Deconstructing Instruction-Following: A New Benchmark for Granular Evaluation of Large Language Model Instruction Compliance Abilities

Existing benchmarks often conflate instruction compliance with overall task success. This paper introduces MOSAIC, a modular framework that dynamically generates prompts with up to 20 application-oriented constraints, enabling granular, per-constraint analysis. Across five LLM families, it shows that compliance is not monolithic — it varies with constraint type, quantity, and position, exposing model-specific weaknesses and primacy/recency biases.

EACL 2026 · Long Paper
Jan 2026
Enhancing LLM Instruction Following: An Evaluation-Driven Multi-Agentic Workflow for Prompt Instructions Optimization

LLMs often produce output that is conceptually correct but violates formal constraints like word limits or formatting rules. This paper proposes a multi-agentic workflow that separates optimization of the core task from its specific output constraints, using quantitative compliance scores as iterative feedback signals. The method yields significantly higher instruction-following scores on Llama 3.1 8B and Mixtral-8x 7B without any model fine-tuning.

arXiv · 2026
Dec 2025
A Multi-Stage Workflow for the Review of Marketing Content with Reasoning Large Language Models

This work proposes an automated multi-stage pipeline for checking marketing content against compliance requirements, without relying on external knowledge bases. It benchmarks fine-tuning strategies — SFT vs. GRPO — and evaluates how reasoning tokens improve smaller models' ability to detect violations. The study also systematically tests how different reward function combinations shape model behavior under GRPO training.

arXiv · Dec 2025
EMNLP 2025
GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection

GRAID tackles data scarcity in harmful text classification by generating training examples that are geometrically spread across the embedding space, then diversifying them stylistically through a multi-agent reflection loop. The pipeline is model-agnostic and domain-agnostic, designed to improve guardrail coverage without manual annotation. On two benchmark datasets it achieves an average F1 gain of 12% over baselines.

EMNLP 2025