Beth Pearson - PhD Student on the Interactive AI CDT at Univeristy of Bristol

About Me

I'm a PhD student at the University of Bristol, working in the intersection of Computer Vision and Natural Language Processing. My research focuses on Compositional Generalisation in Vision-Language Models, specifically understanding how these models compose and interpret spatial relationships between objects in images.

I completed a 3-month internship developing a tool to help radiologists identify meaningful differences in reports, supporting the training of junior doctors.

Before starting my PhD, I completed my MEng in Engineering Mathematics at the University of Bristol. After graduating, I worked as a Software Engineer at Wilxite, where I built internal web applications for data management, invoicing, and other business operations.

When I'm not coding or writing papers, I'm probably out running, rock climbing, or catching up with friends at the pub. I'm always interested in collaborating on interesting projects and discussing the future of AI!

Recent Work

Spatial Compositional Reasoning and Negation in Vision-Language Models

Status: In Progress • Expected: 2025

I'm investigating how vision-language models like LLaVA handle negation, by comparing how their visual attention shifts when processing positive vs. negated versions of the same sentence—and how this compares to human patterns.

Semantic Similarity in Radiology Reports via LLMs and NER

Status: Paper Accepted at AI Bio Workshop at ECAI 2025

We explore how large language models can help evaluate junior radiologists' reports by identifying meaningful differences from senior-edited versions. Our method, Llama-EntScore, combines LLaMA 3.1 with named-entity recognition to produce interpretable similarity scores, achieving 93% accuracy within ±1 of expert ratings—outperforming LLMs and NER alone.

Evaluating Compositional Generalisation in VLMs and Diffusion Models

Status: Paper accepted at IWCS 2025

We explore whether diffusion models can better handle compositional generalisation—understanding how objects, attributes, and relationships combine in images—compared to models like CLIP. While diffusion models do well matching objects with attributes, all—including CLIP—struggle with spatial relationships like "left" and "right." This highlights ongoing challenges in teaching AI to understand how parts of a scene relate.

News & Updates

🏆 Best Poster at UK AI 2025

June 2025

Honored to receive the Best Poster Award at the UK AI 2025 Conference supported by HDR (National Institue for Health Data Research). I presented my work on using LLMs to evaluate radiology reports.

🌍 Attended CVPR 2024 in Seattle

June 2024

Fantastic experience at CVPR 2024! Attended inspiring talks on computer vision, networked with researchers, and presented a poster on our compositional generalisation project. The conference was incredibly valuable for staying up-to-date with the latest developments in the field.