Are AI tools already better than lawyers at legal research?

“All tested AI tools scored higher than the lawyer.”

That is the striking conclusion of the new VLAIR Legal Research Benchmark study, conducted by Vals AI. This benchmark focuses on one of the most critical tasks within legal practice: legal research. The performance of four AI products was compared with that of lawyers. And the result? In 75% of the questions examined, the AI tools outperformed the legal professional.

But does this mean you can now blindly trust ChatGPT for your case law research? Certainly not. The results provide a nuanced picture: AI can perform impressively, but primarily under specific circumstances. And in some cases, legal professionals simply remain indispensable. The power of AI lies mainly in speed, structure, and access to vast amounts of information, but interpretation, strategic thinking, and nuance remain human tasks.

In this blog post, we take a closer look at the study: what exactly was tested, how the AI tools performed, and what legal professionals can learn from this.

The study’s design

The benchmark is a follow-up to the previous VLAIR study on legal AI tools. This time, the focus was entirely on legal research: answering legal questions based on US laws and regulations. Legal research is often seen as a fundamental part of legal practice, where accuracy, reliability, and the use of sources are essential.

The researchers compared four AI products:

  • Alexi
  • Counsel Stack
  • Midpage
  • ChatGPT (generalist AI)

Their answers were tested against a Lawyer Baseline: a control group of lawyers who received the same questions without the help of AI. In total, it involved 200 legal questions that frequently occur in practice. These questions came from a dataset compiled in collaboration with leading US law firms.

Assessment took place based on three criteria:

  • Correctness (50%) – Is the answer substantively correct?
  • Use of sources (40%) – Have the correct legal sources been cited?
  • Readability (10%) – Is the answer understandable and directly usable?

All answers were assessed by a team of legal professionals and librarians, who worked anonymously and had no knowledge of which tool had generated which answer.

Five lessons from the study

  1. AI scores better than the average lawyer

All four AI tools performed better than the control group of lawyers. Counsel Stack was the highest-scoring tool, but the other AIs were also close to each other with scores between 74% and 78%. By comparison, the average score for the lawyers was 69%. Thus, the AI tools all performed above the human benchmark.

In 75% of cases, AI beat the human. When AI scored better, the difference was an average of 31 percentage points. This means that AI not only works faster but is also often more complete and consistent in answering legal questions. At the same time, there remained specific scenarios in which the legal professional performed better.

  1. Accuracy is strong, but the use of sources makes the difference

AI scored high on correctness: between 78% and 81%. Notably, ChatGPT, as a generalist tool, was hardly inferior to the specialized legal AIs in terms of accuracy. Performance among the AI tools was closely matched, suggesting that the baseline intelligence of generative models has now reached a high level.

The difference lay primarily in the use of sources. Legal AI tools referred more often to primary, valid sources such as legislation or court rulings. ChatGPT sometimes used only public summaries or forgot to provide references. Especially for complex questions with many legal details, correctly citing sources proved decisive.

  1. AI struggles with complex jurisdictions

As soon as a question concerned multiple areas of law, the scores of all participants dropped, including the legal AI. Only ChatGPT held its ground with a constant score, possibly due to the model’s broader training range.

In such cases, human contextual knowledge proved important. Legal professionals knew better how to interpret incomplete or unclear questions and were less likely to receive a zero score. AI tools, on the other hand, sometimes provided no answer at all or referred to irrelevant legislation.

  1. AI is lightning fast, but not always complete

AI provided answers quickly and often extensively. But that length was not always an advantage. Sometimes the core was missed, or irrelevant details were added. This is due to the fact that AI often tries to mention all possibilities without distinguishing by relevance. Lawyers, by contrast, provided shorter, more concrete answers. These were sometimes incomplete according to the formal assessment criteria, but substantively correct and usable in practice. In a commercial context, where speed and clarity are important, that human sharpness remains valuable.

  1. The combination of human + AI remains the strongest

The study shows: AI alone can already do a lot, but it is not infallible. Legal professionals can catch errors, add missing context, and assess whether the source found is truly persuasive. The combination of human and machine even yielded the highest scores in previous benchmarks.

What does this mean for your practice?

This benchmark shows that legal research, traditionally a time-consuming and specialized task, is ideally suited for (partial) automation using AI. But you have to know what you are doing. AI tools have the potential to speed up work, but without supervision, they can also cause errors.

Three tips for legal teams:

  • Use specialized AI tools when accuracy is paramount. Only then will you get reliable source references that meet the legal standards of your firm or client.
  • Combine AI with human supervision. Have AI create the first draft and check the citations, interpretation, and relevance yourself.
  • Choose the right workflow. AI scores lower on unclear questions or when it does not know which jurisdiction is relevant. With the right context and guidance, AI performs much better.

Additionally, it is advisable to establish clear guidelines for the use of AI within the team, including agreements on source verification, liability, and confidentiality.

Conclusion: legal research with AI works, but not on autopilot

The latest VLAIR study convincingly demonstrates that AI is now a serious player in legal research. Not perfect, not autonomous, but faster, more consistent, and often more accurate than the average legal professional, especially for standard questions.

Yet it is not a replacement for human insight. Especially with complex questions, unclear context, or multiple jurisdictions, the legal professional remains indispensable. Those who use AI smartly as a tool and not as a replacement will get the most out of it.

The future of legal research? A hybrid collaboration in which AI does the searching and the legal professional provides the interpretation and final check. Those who organize this smartly will gain time, quality, and trust—both with clients and within their own organization. Let AI do the groundwork. But you remain the legal professional who adds the finishing touches.

Or as the researchers themselves summarize it: “AI is not a replacement for the legal professional, but a doubling of his or her impact.”

LegalMike in Action

Every two weeks on Friday afternoons, we organize a digital knowledge session. During these sessions, we demonstrate how to optimally utilize LegalMike in your legal practice, from real-world examples to practical tips.

The next knowledge session will take place on April 10.

Or join directly via Google Meet.