Hyderabad: Artificial intelligence is becoming efficient in almost every field, including the research sphere. According to a new study led by UCL researchers, Large language models (LLMs) trained on vast datasets of text can predict the results of proposed neuroscience studies more accurately than human experts.
The findings, published in Nature Human Behaviour, show that LLMs can analyse patterns in scientific literature and predict scientific outcomes with impressive accuracy. These results underscore their potential to significantly speed up research, beyond merely retrieving knowledge.
Lead author Dr Ken Luo of UCL Psychology & Language Sciences noted that, while much research has highlighted LLMs' ability to answer questions and summarise past information, his team investigated whether these models could also synthesise knowledge to predict future outcomes.
The international research team, addressing the time and resource demands of scientific progress, explored whether LLMs can identify patterns in scientific literature and forecast experiment outcomes. They developed BrainBench, a tool to evaluate LLMs' ability to predict neuroscience results. It comprises pairs of neuroscience study abstracts, with one being authentic and the other having modified, incorrect results.
The researchers tested 15 general-purpose LLMs and 171 human neuroscience experts to see if they could correctly identify the real study abstract. The study found that LLMs outperformed neuroscientists, with 81 per cent accuracy compared to the humans' 63 per cent. Even the most expert neuroscientists achieved only 66 per cent accuracy.
When LLMs were more confident in their decisions, they were more often correct. This suggests a future where human experts collaborate with well-calibrated models, UCL said in a blog post.
BrainGPT performs even better
The researchers then adapted a version of open-source LLM called Mistral and trained it on neuroscience literature to create BrainGPT. The new LLM showed even better performance in predicting study results, achieving 86 per cent accuracy, compared to the 83 per cent accuracy of the general-purpose Mistral.