AI versus the legal professional: who writes the better first draft?
“The future of contract drafting is not determined by man or machine alone, but by the collaboration between them.”
The first major benchmark study into AI and contract drafting shows that artificial intelligence is increasingly performing as well as, or even better than, legal professionals when drafting contracts. The study by LegalBenchmarks.ai compares the performance of 13 AI tools (including Wordsmith, August, and major generative language models such as ChatGPT) with that of experienced legal professionals in the field of contract drafting. The result: the top AIs score higher than experienced legal professionals on reliability. When assessing usability, the difference is slightly smaller, but even there, AI is not inferior to experienced legal professionals. In short: AI as a full-fledged assistant in creating contracts? According to this research, yes.
But what does this research really say? And can we simply trust these results?
The first true AI benchmark for legal professionals
The study ‘Benchmarking Humans & AI in Contract Drafting’ compares 13 AI tools with legal professionals based on 30 specific contract tasks.
Each output was evaluated on three components:
- Reliability of the output: is the text legally and factually correct?
- Usefulness of the text: does the draft help to arrive at a usable version more quickly?
- Workflow support: how well does the tool fit into the daily practice of legal professionals?
The main conclusion: AI matches, and in some cases exceeds, the level of experienced legal professionals. The top-performing tool, Gemini 2.5 Pro, delivered a legally reliable first version in 73.3% of cases. This is higher than the top-scoring legal professional (70%) and the average of the legal team (56.7%).
AI identifies risks that humans sometimes overlook
It is striking that in high-risk scenarios, legal AI tools provide explicit warnings more often than legal professionals. In 83% of high-risk tasks, specialized AI tools gave a warning about potential invalidity or conflict with legislation.
This did not occur with the legal professionals in the cases studied, but that does not mean they do not recognize risks. It could also be due to interpretation, prioritization, or the limited time per task.
In a task concerning a penalty clause under New York law, several AI tools signaled that the percentage mentioned (10%) might be seen as a penalty rather than liquidated damages, which can have legal consequences. This warning was absent from the answers provided by the legal professionals.
For legal professionals using AI as a tool, this is a valuable addition. Especially with complex or routine clauses, AI can serve as an extra layer of verification, provided it is deployed correctly.
How do AI tools compare to each other?
The benchmark distinguishes between two types of AI tools:
- General AI tools: perform surprisingly well in terms of output reliability.
- Legal AI tools: score better on usability and align better with the daily work practice of legal professionals.
In practice, this means:
- General tools can write well but lack legal context and integration into legal workflows.
- Legal tools are better tailored to how legal professionals actually work, with features such as integration into word processors, templates, risk signaling, and support for drafting and editing contracts.
This workflow support makes legally oriented AI tools ideally suited for contract work. They are not a standalone solution but connect directly to existing processes, thereby enhancing the productivity of legal professionals.
Humans remain stronger in context and nuance
Nevertheless, there remain tasks where AI falls short. In assignments requiring significant context, such as combining templates, emails, and term sheets, legal professionals perform better.
The legal professional also remains the standard for commercial assessments and strategic concessions. AI sometimes writes clauses that are too favorable to the counterparty or omits crucial nuances. Here, human judgment remains essential.
The real gain: collaboration between humans and AI
The most interesting result? The combination of human plus AI scored highest on reliability (61.5%). AI is therefore not a replacement, but an addition. When used correctly, it makes legal professionals faster, more precise, and better prepared for risks.
Example: in tasks where AI created a first draft and the legal professional edited it, the turnaround time decreased from an average of 13 minutes to less than 3 minutes. That is a gain in both time and quality.
Critical remarks
Nevertheless, caution is advised when interpreting this benchmark.
For instance, the following criticisms of the methodology can be identified:
- Not all tools were tested in their full capacity. For example, ChatGPT did not use ‘Thinking mode’, and Gemini did not use ‘DeepThink’, even though these can significantly improve the quality of the output.
- There is a lack of transparency regarding the exact prompts, settings, and methodology per tool, making it difficult to replicate or verify the results.
- The comparison remains fairly generic, while AI tools are often developed for specific use cases and perform at their best in those contexts.
Furthermore, it is unclear how strict the evaluators were, how much time the legal professionals were given, and whether they had access to their own models or knowledge bases, for example.
Without context, the difference between a score of 73% and 57% is easily misinterpreted. The highest score, 73%, comes from the best-performing AI tool in the benchmark (Gemini 2.5 Pro). The 57% is the average of all tested AI tools. By comparison, human legal professionals averaged 56.7%, with an outlier of 70% for the top-scoring professional.
This criticism touches on an important point: benchmarks like this are valuable as a signal but must be transparent, repeatable, and realistic. Especially in the legal domain, where diligence is paramount, nuance remains essential.
What does this mean for legal teams?
However, this benchmark does confirm what many legal professionals already experience: AI is fast, consistent, and often surprisingly sharp. But it requires critical use, clear instructions, and human oversight.
For legal departments and law firms, these are the core questions:
- Which tasks can we already delegate to AI today without risk?
- Where do we want to use AI as a support line and where does human nuance remain essential?
- How do we design a hybrid workflow in which both the legal professional and AI collaborate on quality?
Teams that take these questions seriously now are taking an important step. Not only because the work can be done more efficiently, but primarily because they learn when to deploy AI and when human insight remains indispensable.
Conclusion: AI writes well, but not without supervision
The benchmark from LegalBenchmarks.ai underscores that AI tools are increasingly capable of drafting legally usable contracts. In some cases, even better than human legal professionals. But that does not mean we can sit back now. Without insight into exactly how the tools were tested, how they handle context, and which settings were used, there remains room for doubt and nuance.
For legal professionals, this is primarily an invitation to take AI seriously, but also to maintain a keen eye on the limits of this technology. AI can be a valuable assistant, but it is not a replacement for human insight and experience. Certainly not in complex negotiations or legally sensitive clauses.
An important point from practice is that speed is not everything. One of the respondents indicated that he would no longer hire his 25-year-old self because AI now works faster and more consistently. At the same time, he emphasized that you must know when to intervene. That skill remains irreplaceable.
The lesson? Do not be intimidated by the speed of AI, but learn to work with it. Use it as a tool to become better at what makes you unique as a legal professional: sharpness, context, nuances, and common sense.