Persona Dynamics is built on six years of field research and validated through independent performance evaluation. This page sets out the evidence behind the platform: the studies, methodology, findings, and the ongoing validation infrastructure we maintain.
The platform's architecture is grounded in longitudinal PhD research examining AI in professional creative practice: three field studies conducted over six years in live commercial environments. Three consistent findings emerged across all contexts.
AI outputs without visible provenance were not trusted in professional contexts. Practitioners required traceable sources to justify decisions to clients and colleagues. Source transparency is not a feature: it is a prerequisite for legitimate use in accountable environments.
AI was adopted for divergent ideation and exploration, but resisted where professional accountability applied. Practitioners would not let AI make decisions in areas where they were accountable to clients, reinforcing the need for explicit decision-support framing rather than autonomous recommendation.
Teams lost context at handoffs, across tools and between sessions. The ability to maintain a continuous, queryable record of decisions and evidence, from brief through delivery, was identified as a consistent structural gap in existing workflows. This is the direct origin of the Digital Thread.
"Having the ability to be able to refer back to where the sources are coming from, how confident they are; that's helpful."
"I feel more confident that this is more accurate than just ChatGPT making it up, precisely because you can see how the responses are grounded in the research."
50 real-world question–answer pairs assessed by 9 independent subject matter experts using a 10-item rubric under anonymised, blind conditions. System and human responses were presented together without attribution. 252 individual rubric evaluations in total.
The headline accuracy finding is encouraging: the system matched human domain expert performance on core claims under blind conditions. The more significant finding is the actionability gap: Persona Dynamics produced specific, actionable recommendations at a substantially higher rate than human experts.
In decision-support contexts, a technically accurate but non-actionable answer often has limited practical value. The platform's structured, evidence-linked output format consistently produced responses that practitioners could act on.
The critical failure rate finding (higher for AI than humans) is equally important. It does not undermine the accuracy result: it reinforces the design principle that AI outputs require human review, and it was the evidence basis for the confidence scoring and source transparency features built into every response.
A point-in-time evaluation tells you how a system performed on a particular day. What matters in production is ongoing quality assurance as knowledge bases evolve, models update and use cases expand.
We are collaborating with the National Innovation Centre for Data on evaluation protocols covering response accuracy, tone assessment, guardrail effectiveness and multi-turn coherence, extending the point-in-time evaluation into a repeatable, structured framework.
The golden questions framework being developed through this collaboration is the first structured validation protocol for RAG-based persona systems. Once validated, we intend to publish the methodology openly, positioning this as a contribution to the emerging field of AI persona evaluation standards.
Understanding failure modes is as important as measuring accuracy. Persona Dynamics is subject to ongoing adversarial testing that directly shapes platform guardrails.
The platform's architecture is grounded in award-recognised PhD research examining AI in professional creative practice, work supported by Innovate UK BridgeAI feasibility funding with Hartree Centre support, and recognised with a D&AD Award and Deutsche Bank Design Award.
This research directly informed the three architectural requirements now validated through field deployment: source transparency, preserved human authority, and persistent project memory.
The platform has been assessed as minimal-risk under the EU AI Act, consistent with its advisory, human-in-the-loop design and the absence of autonomous decision-making in high-risk sectors.
An advisory board member with AI regulatory expertise monitors UK and EU regulatory development. We have been involved in feedback processes for BS ISO/IEC 42005 and BS EN ISO/IEC 23894.
We're happy to share more detail on our evaluation approach, the evidence behind specific platform features, or how validation works in your deployment context.