Safe and Trustworthy AI Workshop 2022

Capabilities of AI systems have seen tremendous developments in the last few years. However, guaranteeing that these systems are safe and trustworthy is still an issue. This workshop aims to make progress on this issue by bringing researchers together for a poster session. There will be prizes for the best posters in the form of a reward of 100 GBP for each winner. The workshop will take place 2 November 2022 in London and there is no registration fee. It is organised by PhD students from London universities and other early career researchers are especially encouraged to attend. The workshop is supported by the UKRI Centre for Doctoral Training in Safe and Trusted AI; the prize is supported by the Longterm Future Fund.

Call for Posters

Safe and trustworthy AI is a broad field and we accept submissions in, for example, the following subfields.

The submissions should be in the form of abstracts of no more than 200 words. The relationship of the work to safe and trustworthy AI should be clear. The deadline for submissions of abstracts is 28 September 5 October 2022 and the organisers will notify you whether you have been accepted to the workshop no later than 12 October 2022. Abstracts should be submitted on EasyChair below.

You can register both if you are presenting a poster and if you only want to attend. Hopefully, everyone who wants to attend will be able to do so, but capacity is limited, so this is no guarantee. It is possible to register below until 26 October 2022.

You are responsible for printing your poster before arriving at the venue. The organisers encourage you to let the institutions you are affiliated with to reimburse you for the printing. There will be no evaluation of the poster before the workshop event itself.


The workshop event on 2 November 2022 will take place from 2pm to 8pm. The location is the South Kensington campus of Imperial College London. Specifically, it is in the Huxley building, and we suggest you enter to the reception at 180 Queen's Gate. From there, walk up the stairs, turn left, and turn left again, and you will find rooms 341 and 342, where the poster sessions will be. The talks will be in the nearby lecture theatre 308.

Your institution should normally be able to reimuburse you for the trip. (If that is not possible, and the cost of attending is a barrier, get in touch, and we will see what can be done.) There will be complementary refreshments for all participants.

During the workshop event there will be a competition for the best posters. The winners will be decided democratically; all participants, who take part in both poster sessions, will be able to vote. The mechanism will be approval voting. The top 5 candidates will all be considered winners and will receive 100 GBP each. If there is a tie between the 5th, 6th, ..., nth candidates, each of the n candidates will receive (500/n) GBP.

There will also be two speakers with numerous publications in the field of safe and trustworthy AI.

The schedule is shown below. Posters with titles starting with letters A to K are scheduled for the first poster session and the rest (L to Z) for the second.

Start timeActivityRoom
13:00Welcome and setup for poster session 1 (A-K)341, 342
13:30Poster session 1 (A-K)341, 342
14:30Talk 1: Tom Everitt. Causality in AI Risk.308
15:30Break and setup for poster session 2 (L-Z)341, 342
15:45Poster session 2 (L-Z)341, 342
17:00Talk 2: Jesse Clifton. Differential progress in Cooperative AI.308
18:00Reception341, 342


Causality in AI Risk

Speaker: Tom Everitt

Abstract: With great power comes great responsibility. Human-level+ artificial general intelligence (AGI) may become humanity's best friend or worst enemy, depending on whether we manage to align its behavior with human interests or not. To overcome this challenge, we must identify the potential pitfalls and develop effective mitigation strategies. In this talk, I'll argue that (Pearlian) causality offers a useful formal framework for reasoning about AI risk, and describe some of our recent work on this topic. In particular, I'll cover causal definitions of incentives, agents, side effects, generalization, and preference manipulation, and discuss how techniques like recursion, interpretability, impact measures, incentive design, and path-specific effects can combine to address AGI risks.

Differential progress in Cooperative AI: motivation and measurement

Speaker: Jesse Clifton

Abstract: Skills that lead to strong performance in multi-agent systems can be detrimental to social welfare. This is true even of skills that play a central role in cooperation: the ability to understand other agents can make it easier to deceive and manipulate them, and the ability to commit to peaceful agreements can also facilitate coercive commitments. I’ll argue that the field of Cooperative AI should focus on improving skills that robustly lead to improvements in social welfare, rather than those that are dangerously dual-use. That is, we need differential progress on cooperative intelligence. To know whether we are making differential progress, we need to be able to rigorously define and measure it. Towards these ends, I’ll review early-stage work on the definition and measurement of cooperative intelligence and other cooperative capabilities.


The organisers, in alphabetical order, are

If you have any questions, you can get in touch with us organisers below.