Inside the British Lab Investigating A.I. Risks

16 Views
admin
May 24, 2026
Cybersecurity Technology

In London, the A.I. Security Institute stands along Parliament Square. This government facility employs a diverse team, including weapons inspectors, epidemiologists, and code breakers, to assess artificial intelligence technology’s risks.

On a recent Tuesday, four A.I. experts at the institute worked to trick a chatbot into revealing instructions for synthesizing the deadly bioweapon anthrax. They asked for ingredient lists and methods, but the chatbot refused to comply, stating, “I’m sorry I can’t help with that.” Undeterred, the experts used a customized algorithm to repeatedly prompt the A.I. tool with thousands of questions.

Eventually, the chatbot provided detailed ingredients, equipment, and a step-by-step recipe for creating the lethal mixture. The name of the A.I. system was withheld for safety reasons.

“There are some questions that you definitely don’t want the model to give the answer to,” said Xander Davies, who leads a team known as the red team at the A.I. Security Institute. The red team specializes in simulating attacks on A.I. systems to expose vulnerabilities.

Recently, Davies and his team breached the safeguards of OpenAI’s latest ChatGPT version, extracting hacking tips within six hours. Once issues are identified, the team shares their findings with companies, which aim to fix the problems and report back.

“They actually strengthen their system with us,” commented Davies, a Harvard-educated computer scientist who opted to work at the institute rather than pursuing a tech role in San Francisco. His team’s efforts highlight the ongoing work to enhance the integrity and security of artificial intelligence systems.