天美传媒

AI evidence pipelines could offer reliable support for conservation decisions

by Kate Grimwood, Alec Christie

Man holding lightbulb

漏 stock.adobe.com

AI can match human experts in retrieving conservation evidence, researchers from 天美传媒 and Cambridge University find.

The global biodiversity crisis is further complicated by a gap between scientific evidence and conservation practice. Despite the wealth of research available, scientific findings are not consistently applied to improve the effectiveness of conservation efforts.

For example, bat gantries, large structures installed over major roads to help bats avoid traffic, have been used across the UK for many years. These installations have cost at least £1 million, yet evidence indicates they are largely ineffective in achieving their intended purpose [1].

Similarly, agri-environment schemes under the European Union’s Common Agricultural Policy have sometimes implemented measures that failed to deliver meaningful biodiversity benefits, even though scientific research suggested alternative approaches would have been more successful [1].

A key problem is that crucial scientific knowledge on conservation actions is scattered across numerous studies and so is hard to access. Although databases like compile this data, accessing the right information quickly remains a challenge for those working on conservation-related problems. The question arises: can Artificial Intelligence (AI) assist in navigating this complex evidence landscape?

Simply plugging a question into a general chatbot is not the way to get reliable evidence-based answers. The setup, particularly how the system retrieves information, is crucial to avoid poor performance and misinformation. Dr Alec Christie Research Fellow

AI tools like Large Language Models (LLMs), including ChatGPT, are adept at processing text and generating answers. With these capabilities, AI seems well suited to help conservationists by rapidly accessing and summarising conservation evidence. However, these models must be used with care, as they can make errors, invent facts, or reflect biases, posing risks when applied in sensitive conservation contexts.

This challenge was addressed by a collaborative team that included Alec Christie from the Centre for Environmental Policy at 天美传媒 and was supported by the AI@Cam initiative and the UROP summer research scheme. Together, they explored whether LLMs could accurately retrieve information from the Conservation Evidence database. The research, published in on 15th May 2025, compared AI performance with that of human experts in conservation.

Woman on computer
Testing AI models' ability to accelerate the transfer of evidence into conservation action

Retrieving relevant evidence

The study sought to simulate an exam by using ten large language models (LLMs), including GPT-4o and Claude 3.5 Sonnet, to generate thousands of multiple-choice questions based on data from the Conservation Evidence database. Each question’s answer was drawn from specific pages within the database, which summarise the evidence supporting particular conservation actions. These questions were designed to evaluate the AI models’ ability to locate and interpret relevant information from the database.

The performance of the different AI models was then compared with one another, as well as with responses from human experts. These experts, members of the Conservation Evidence team responsible for curating the database, answered a smaller selection of the questions, providing a high-performance benchmark for the AI models.

The team also tested various AI setups, including:

  • Closed Book: AI used only pre-existing knowledge.
  • Open Book (Oracle): AI was given direct access to the relevant database page.
  • Open Book (Confused): AI received both relevant text and additional, unrelated information from other pages in the database.
  • Open Book (Retrieval): The AI models searched the database using three different retrieval strategies: sparse, dense, and hybrid. Sparse retrieval relies on keywords to locate the relevant pages, while dense retrieval uses the semantic meaning of the question to find the best matching page. The hybrid approach combines both methods, selecting the strongest matches from each to identify the most relevant page.

The study revealed that, with the right configuration, AI systems can perform at or even above expert-level accuracy. Using the hybrid retrieval method, models such as GPT-4o, Llama 3.1 70B, and Gemma 2 27B achieved mean accuracy scores between 95.6% and 97.8%, slightly surpassing the human experts’ average of 94.8%, although this difference was not statistically significant. Overall, there was no significant difference between AI models employing the hybrid approach and human experts in both the accuracy of their answers and their ability to retrieve the correct database page. The exception was Llama 3.1 8B Instruct Turbo, which performed significantly below expert level, with 86.7% accuracy.

The hybrid retrieval method (88.9% mean retrieval accuracy) consistently outperformed both sparse and dense methods, which achieved 71.1% and 80.0% respectively, both significantly lower than the 87.8% retrieval accuracy of human experts. AI performance also declined noticeably when tested without access to the database documents (the “Closed Book” scenario), with mean accuracy dropping to between 62.6% and 69.8%.

The research highlighted a major advantage of AI: response speed. While AI provided answers almost instantaneously, human experts took an average of over two minutes per question.

Implications for conservation decision-making

The findings underscore the importance of carefully selecting and evaluating both the large language models (LLMs) used and the methods for retrieving the information they rely on. The study clearly shows that using LLMs "out of the box," without tailored access to relevant databases, can result in inaccurate or misleading answers, particularly in specialised fields like conservation.

The results also point to the potential of well-designed AI systems to support conservation efforts. When responsibly configured, AI tools could help practitioners rapidly access relevant, evidence-based information from resources such as the Conservation Evidence database, improving the speed and quality of decision-making.

Looking ahead, further research is needed to tackle more complex conservation questions that demand nuanced reasoning. It will also be essential to explore the ethical dimensions of AI in conservation, including equitable access to tools, minimising environmental impacts, and ensuring that human judgement and critical thinking remain central to interpreting evidence and shaping conservation decisions.

Acknowledgements

This research was conducted by Radhika Iyer, Sam Reynolds, William Sutherland (University of Cambridge), Alec Christie (Centre for Environmental Policy, 天美传媒), Sadiq Jaffer, and Anil Madhavapeddy (University of Cambridge). The project was supported by the AI@Cam initiative, the UROP scheme, and funding from various donors, including an unrestricted donation from Tarides and John Bernstein.

Sources

[1] Sutherland, W.J., Wordley, C.F.R. Evidence complacency hampers conservation. Nat Ecol Evol 1, 1215–1216 (2017).

[2] Iyer R, Christie AP, Madhavapeddy A, Reynolds S, Sutherland W, et al. (2025) Careful design of Large Language Model pipelines enables expert-level retrieval of evidence-based information from syntheses and databases. PLOS ONE 20(5): e0323563.



Article text (excluding photos or graphics) © 天美传媒.

Photos and graphics subject to third party copyright used with permission or © 天美传媒.

Reporter

Kate Grimwood

Centre for Environmental Policy

Alec Christie

Centre for Environmental Policy