DeepSeek R1 has shown great reasoning capability when it is firstly released. In this blog post, we detail our learnings in using DeepSeek R1 to build a Retrieval-Augmented Generation (RAG) system, tailored for legal documents.
We choose legal documents because legal professionals often face a daunting task: navigating libraries of cases, statutes, and informal legal commentary. Even the best-intentioned research can get bogged down in retrieving the right documents, let alone summarizing them accurately. This is a great domain for RAG (see also Harvey AI)!
We built the RAG on top of the piles of legal documents dataset, with DeepSeek R1 and Qwen2 embeddings. We open-source the entire pipeline for building the RAG (powered by vLLM and SkyPilot) and shared some hard-won experiences—what worked and what didn’t.
Want to see this in action 🔥? Try our hosted demo here.
We open-sourced the pipeline for building this RAG here. Come explore how we combine vLLM + SkyPilot to achieve fast, resilient, and scalable performance.
Why build a RAG on legal documents?
Retrieval-Augmented Generation is crucial for domains where accuracy and traceability matter—like law. By connecting a specialized retriever with a robust LLM generator, we get the best of both worlds:
- High-relevance document retrieval
- Reasoning-rich text generation
- Reduced hallucinations through direct referencing
To build this RAG, we use:
- A subset of pile-of-law/pile-of-law dataset (focusing on legal advice),
- ChromaDB as the vector store for storing the embedding storing and querying,
- Qwen2 embeddings for retrieval, and
- DeepSeek R1 for final answer generation.
The entire setup runs on vLLM and deploys seamlessly to any cloud via SkyPilot for best GPU availability and cost. Please find our open-source pipeline here.
Hard-won lessons: Dos and Don’ts
1. Don’t use DeepSeek R1 for retrieval
Despite DeepSeek R1’s superb reasoning abilities, it’s not good for generating embeddings—at least not yet.
We found some examples for how bad the embedding generated by DeepSeek R1 compared to a specialized embedding model Alibaba-NLP/gte-Qwen2-7B-instruct
, the current best embedding model on the MTEB leaderboard.
We generate embeddings for the dataset using the two models and comprise two vector databases. Then, we use the same queries for both models, and find the top 5 most similar embeddings in the vector database generated by the corresponding model.
Query: I want to break my lease. My landlord doesn't allow me to do that.
Results by Qwen | Results by DeepSeek-R1 |
---|---|
Leasing Agent saying i have to stay another month because of 30 days of notice? | 100% At Fault for Car Accident, now Insurance Company has my case moving to "litigation department" |
Moving into new apartments and one of my room mates can't be on the lease because she's on another lease with her ex boyfriend for a couple more months. | Landlord telling tenants we must pay for her chimney to be swept, must use a sweep of her choice |
Landlord Asking to Sign New Lease | Ex Girlfriend stole my car |
AZ Landlord requiring us to vacate for house showings. | 1/5 roommates did not pay rent |
[MO] both of our names are on the lease - what's the best course of action if I want to kick my boyfriend out? | I got into a car crash, and after I was kind of assaulted. |
Query: I received a summons for small claims court regarding a disputed debt. How should I prepare?
Results by Qwen | Results by DeepSeek-R1 |
---|---|
[CA] A question about small claims | Can I sue the company who hires telemarketers that break the law by calling me non-stop? |
[CO] Just received court summons for a hospital bill that my insurance already paid...I have the proof, but do I still need to hire a lawyer to get this fixed? | Ex Girlfriends Daughter is being arrested for shoplifting, What do we do? |
I'm being taken to court over a $3,500 loan I defaulted on | It's late at night and you're ending your shift. You go to the parking lot which is mostly empty, except you see two guys breaking into your car. You tell them to stop but they ignore you. You have a concealed handgun. What can you do? |
Roomates cat ate some trail mix and had to go to the vet. The bill was over 1500$ and she's threatening to take me to small claims court to pay it. There's no evidence the trail mix is mine. Should I be worried? | Missouri Possession of cannabis charges with no physical evidence |
I have recently been served a civil summons for a debt I thought I was already paying. | Would "releasing the hounds" be legal? |
Why the difference?
In the above table, DeepSeek R1’s retrieval results are clearly worse. Why?
We think that the fundamental issue lies in how DeepSeek-R1 are trained. DeepSeek-R1 is architected primarily as a reasoning engine, focusing on sequential thinking and logical connections. That means DeepSeek-R1 does not map documents into a semantic space.
In contrast, the Qwen2 model variant (gte-Qwen2-7B-instruct
) was specifically trained for semantic similarity tasks, creating a high-dimensional space where conceptually similar documents cluster tightly together regardless of specific wording.
This difference in training process means that Qwen excels at capturing the intent behind queries, while DeepSeek-R1 sometimes follows the reasoning paths that lead to thematically related but practically irrelevant results.
Unless DeepSeek-R1 is finetuned for embeddings, it should not be used as a retrieval embedding model for RAG.
2. Do Use R1 for Generation: The Reasoning Is Impressive
While R1 struggles with embeddings, we found its generation capabilities are phenomenal. By leveraging R1’s chain-of-thought approach, we saw:
- Stronger coherence: The model synthesizes insights from multiple documents, referencing relevant passages with clarity.
- Reduced hallucinations: R1 “thinks aloud” internally, verifying each conclusion through the lens of your data.
Let’s see a few examples:
Example 1: Breaking a Lease
Question: I want to break my lease. My landlord doesn't allow me to do that.
View DeepSeek R1's Thinking Process
Okay, so I want to break my lease, but my landlord isn’t allowing me to do that. I’m not sure what my options are. Let me try to figure this out based on the documents provided.
First, I remember reading about early termination penalties. In Document 1, someone was asking if they could get out of an 18-month lease after 15 months. The answer was yes, but only if the landlord agrees or finds a new tenant. So, if I want to break my lease early, the landlord might hold me responsible unless they agree or can rerent quickly. That seems like a possible route, but I’m not sure if my landlord would be willing to negotiate.
Then, looking at Document 2, there’s a situation where someone wants to break their lease because they’re getting married. The answer suggests that the lease might allow breaking it for a fee or they might have to wait until the lease ends. So maybe I can check if my lease has any conditions that allow breaking it, like a sublet or assignment clause. If not, I might have to pay a penalty or wait it out.
Document 3 talks about breaking a lease due to a service dog. The options are to pay a penalty, sublet, find another service animal, or wait until the lease is up. This doesn’t directly apply to me, but it shows that breaking a lease usually involves some consequences unless there’s a specific reason.
In Document 4, the landlord is increasing the rent, and the user is questioning the short notice period. The answer mentions that technically there should be a 30-day notice, but the landlord might offer a way out by agreeing to pay the old rent until the next period. If my landlord is refusing to let me break the lease, maybe offering to pay the increased rent could be an option, but I don’t know if that’s feasible for me financially.
Document 5 mentions a landlord refusing to renew a lease because of a cat. The answer says there’s no legal obligation to renew, so the landlord can just not renew. That makes me think that my landlord might have similar discretion, but I don’t know if that’s the reason they’re refusing.
Document 6 is about a lease renewal with new terms. The answer says the landlord must honor the current lease until it’s up. So if my landlord is refusing to let me break the lease, maybe they have to wait until the end of the lease term unless there’s a specific clause allowing them not to renew.
Document 7 and 8 are about the leasing office making it difficult to renew. They mention that the landlord might not want to renew because of bad reviews or other reasons, but legally, the landlord can choose not to renew. This shows that landlords have some power, but they might not be legally required to let me break the lease unless there’s a specific reason.
Document 9 talks about a 30-day notice requirement to move out. Even if I didn’t sign anything when moving in, the landlord might still require notice. So if I want to move out without giving notice, I might face penalties or have to pay rent for the full term.
Finally, Document 10 is about a lease where a roommate is leaving, and the tenant has to cover the rent. The answer suggests subletting to help cover costs. Maybe I could try subletting my part of the lease if I’m leaving, but that depends on the lease terms.
Putting this all together, I think I need to check my lease for any clauses about early termination. If there’s nothing specific, I might have to pay a penalty unless the landlord agrees to let me out. If I have a good reason, like a move, I could try negotiating with the landlord. Subletting might help if possible, but I need to see what the lease allows. I should also consider waiting until the lease ends if I can’t find another arrangement.
View DeepSeek R1's Response
Based on the analysis of the provided documents, here is a structured approach to breaking your lease when your landlord is refusing to allow it:
1. Review Your Lease Agreement
- Early Termination Clause: Check your lease for any clauses regarding early termination. If such a clause exists, you may be legally obligated to pay a penalty or provide notice if you wish to break the lease [1].
- Subletting or Assignment: Determine if your lease allows for subletting or assigning the lease to someone else. This could help you avoid penalties or legal action [2].
2. Understand Landlord’s Discretion
- Discretionary Renewal: Landlords may have the discretion to refuse lease renewal, especially if there are no specific clauses in your lease allowing for early termination. In such cases, you may be required to honor the lease until its term ends [6][8].
3. Consider Financial Consequences
- Penalties: Be prepared for potential financial penalties if you break the lease without the landlord’s consent. This could include additional rent payments or damage deposits [1][9].
4. Explore Negotiation
- Negotiate Terms: If you have a valid reason for breaking the lease, such as a move, you could negotiate with your landlord to either reduce the penalty or allow an early termination [2].
5. Subletting as a Solution
- Subletting Your Space: If possible, consider subletting your portion of the lease to someone else to help cover your financial obligations [10].
6. Plan for the Future
- Wait Until Lease Ends: If breaking the lease isn’t feasible, plan to stay until the lease term concludes. This avoids any penalties and legal issues [6][8].
Conclusion:
Before taking any action, thoroughly review your lease terms. If early termination isn’t allowed, consider subletting or waiting until the lease ends. If you can negotiate with your landlord, that might be the best course of action. Always consult a legal professional for specific advice tailored to your situation.
Example 2: Small Claims Court
Question: I received a summons for small claims court regarding a disputed debt. How should I prepare?
View DeepSeek R1's Thinking Process
Okay, so I just got a small claims court summons for a disputed debt. I’m not really sure where to start, but I need to figure out how to prepare for this. From the search results, I see that several people have gone through similar situations, so maybe I can learn from their experiences.
First, I should probably understand the nature of the debt. In Document 5, someone was served a summons for a debt they thought they were already paying. They were confused about why they were being summoned again. So, I need to verify if this debt is valid. Maybe I should check my records or ask the creditor for proof of the debt. According to Document 10, it’s a good idea to deny the complaint and ask for proof. That way, I can challenge the credibility of the debt.
Next, I should prepare my case. In Document 1, someone was dealing with a similar issue and the advice was to make sure the case can be presented in 10 minutes or less. That means I need to be concise and have all my evidence organized. I should gather any documents that prove I don’t owe this debt, like payment receipts, letters from the creditor indicating settlement, or insurance documents as in Document 2.
I also need to think about witnesses. In Document 4, a roommate was threatening to take the person to court over a vet bill, and the advice was to ignore the threat unless sued. But if I’m sued, I should prepare with witnesses who can back me up. Maybe a friend who was with me when I tried to pay or a previous tenant who can verify the condition of the property, as in Document 7.
Understanding the court process is crucial. From Document 7, after winning a small claims case, the landlord appealing meant another court appearance. The person had to go through the same process again. So, I should know that even if I win, there might be an appeal process. I should be prepared to present my case again if needed.
I also need to consider the Summons and Complaint. In Document 5, the person had to file an answer to the complaint. If I don’t do this, I might get a default judgment. So, I must respond on time and clearly state my defense.
Filing the Answer is the next step. Document 10 mentioned that even if there’s an offer, I should always deny and ask for proof. This gives me leverage to negotiate a better settlement later. I should write down all my points and evidence in my Answer.
Considering my financial situation is important too. In Document 6, someone was worried about being jailed for inability to pay, but it’s clarified that debtors’ prisons aren’t a thing. Still, I should communicate my financial status. If I can’t pay, I need to show that I’ve made reasonable efforts and offer a payment plan, as in Document 5.
If I’m unsure about the legal stuff, maybe I should consult a lawyer. In Document 3, the person was moving and had a court date. Consulting a debt settlement lawyer might help, especially if I have other debts or complex situations.
Lastly, I should prepare for the court hearing. Being punctual, dressed appropriately, and having all my documents ready is essential. In Document 7, the person won by being well-prepared with evidence, so I should follow that example.
So, putting it all together, I need to: verify the debt, prepare evidence, understand the court process, respond to the summons properly, possibly consult a lawyer, and get organized for the hearing. Citing the relevant documents will help me back up my decisions.
View DeepSeek R1's Response
Based on the analysis of the provided legal documents, here is a structured and organized plan to prepare for the small claims court summons regarding the disputed debt:
- Verify the Debt
- Check Creditor’s Proof: Start by reviewing any documentation from the creditor to confirm the validity and details of the debt. Use this information to cross-reference with your own records or any previous communications.
- Request Documentation: Contact the creditor to request formal proof of the debt, such as court judgment documents or settlement agreements. This step is crucial to challenge the credibility of the debt, as advised in Document 10.
- Prepare Your Case
- Gather Evidence: Collect all relevant evidence, such as payment receipts, letters from the creditor indicating settlement, or insurance documents, as mentioned in Document 2 and Document 8.
- Organize and Present Evidence: Ensure all evidence is organized and ready for presentation in court, adhering to the advice in Document 1. Consider including witness statements if needed, such as from friends who were present during payment attempts or previous tenants regarding property condition.
- Understand the Court Process
- Be Prepared for Multiple Hearings: If you win a case, be aware that there may be an appeal process, as detailed in Document 7. Prepare to present your case again if necessary.
- Familiarize Yourself with Court Procedures: Understand the basics of small claims court, including the structure of the hearing and the role of the judge.
- Respond to the Summons and Complaint
- File an Answer: Ensure you respond to the summons on time to avoid a default judgment. Clearly state your defense and request proof of the debt in your Answer, as advised in Document 10.
- Deny the Complaint: Always deny the complaint to leverage negotiation opportunities, as mentioned in Document 10.
- Consider Financial Situation
- Communicate Finances: If you cannot pay the debt, demonstrate your financial status and offer a payment plan, as outlined in Document 5. Show that you have made reasonable efforts to resolve the debt.
- Consult a Lawyer if Necessary
- Seek Legal Advice: If the situation is complex or you are unsure about your legal standing, consult a debt settlement lawyer, especially if there are other debts or complicated factors involved, as suggested in Document 3.
- Prepare for the Court Hearing
- Be Punctual and Prepared: Arrive early, dressed appropriately, and bring all necessary documents. Use the example from Document 7, where a well-prepared case led to a successful outcome.
- Consider Additional Steps
- Inform Roommates or Witnesses: If the debt relates to a shared property, consider informing any witnesses or roommates who can provide support, as mentioned in Document 4.
By following these steps, you can approach the small claims court hearing with confidence, ensuring your case is presented effectively and your rights are protected.
From these examples, we observe DeepSeek R1’s exceptional reasoning ability. Its thinking process clearly shows how the conclusion is drawn from the source legal documents:
- R1 first constructs a coherent model of the legal question, as evidenced in its detailed thinking process:
First, I remember reading about early termination penalties... Document 1 mentions...
This reasoning-first approach allows the model to methodically connect concepts across multiple sources before retrieval. - When handling complex scenarios like lease terminations or small claim court questions, we observed R1 explicitly worked through its understanding of each document (
Putting this all together...
), without hallucination. - Finally, the generator explains its reasoning with precise citations that link conclusions to sources. This establishes a clear connection from question to reasoning to answer, ensuring both rigor and accessibility.
We tried the model with diverse legal queries, and the model consistently demonstrated the ability to not only extract information from the source documents but also learn from and reason about them.
Takeaway: For question answering and summarization, R1 is a gold mine of step-by-step legal logic. Keep it in the generator stage, and you’ll rarely regret it.
3. Engineering remains important
Prompting
High-level reasoning doesn’t eliminate the need for well-crafted prompts. We found explicit guidelines in the prompt crucial for:
- Encouraging document citations throughout the generated answer.
- Preventing hallucinations with a “cite or say you don’t know” approach.
- Structuring final answers in a user-friendly way.
We construct the following prompt throughout our experiments:
You are a helpful AI assistant analyzing legal documents and related content. When responding, please follow these guidelines:
- In the search results provided, each document is formatted as [Document X begin]...[Document X end], where X represents the numerical index of each document.
- Cite your documents using [citation:X] format where X is the document number, placing citations immediately after the relevant information.
- Include citations throughout your response, not just at the end.
- If information comes from multiple documents, use multiple citations like [citation:1][citation:2].
- Not all search results may be relevant - evaluate and use only pertinent information.
- Structure longer responses into clear paragraphs or sections for readability.
- If you cannot find the answer in the provided documents, say so - do not make up information.
- Some documents may be informal discussions or reddit posts - adjust your interpretation accordingly.
- Put citation as much as possible in your response.
First, explain your thinking process between <think> tags.
Then provide your final answer after the thinking process.
Document chunking
In addition, we found that effective document chunking is important for accurate document retrieval. Chunking the documents helps to make each embedding more concisely represent a specific topic and reduce the amount of tokens required to be processed for each embedding generation.
We used sentence-aware splitting (via NLTK) to apply chunking on the documents. We also have the beginning and ending of each chunk to contain overlap contents with the nearby chunks. It helps the model better interpret partial references without losing the bigger picture. Code for chunking documents
def chunk_document(document, chunk_size=2048, overlap=512):
"""Split document into overlapping chunks using sentence-aware splitting."""
text = document['text']
chunks = []
# Split into sentences first
sentences = nltk.sent_tokenize(text)
current_chunk = []
current_length = 0
for sentence in sentences:
sentence_len = len(sentence)
# If adding this sentence would exceed chunk size, save current chunk
if current_length + sentence_len > chunk_size and current_chunk:
chunk_text = ' '.join(current_chunk)
chunks.append({
'id': document['id'],
'name': document['name'],
'content': document['text'],
'chunk_start': len(' '.join(current_chunk[:-(2 if overlap > 0 else 0)])) if overlap > 0 else 0,
# Additional metadata fields...
})
# Keep last few sentences for overlap
overlap_text = ' '.join(current_chunk[-2:]) # Keep last 2 sentences
current_chunk = [overlap_text] if overlap > 0 else []
current_length = len(overlap_text) if overlap > 0 else 0
current_chunk.append(sentence)
current_length += sentence_len + 1 # +1 for space
Takeaways:
- Using NLTK for sentence-aware tokenization instead of character-based chunking
- Preserving document context with overlapping sentences between chunks
4. vLLM is efficient and fast
Since the legal documents contain a large amount of data, generating the embeddings for RAG could take a lot of time.
Initially, we used the default HuggingFace library sentence_transformer
. We first ran with a typical Nvidia L4 GPU, but we hit our “favorite” error: CUDA out of memory. After trying on Nvidia A100, we find sentence_transformer takes 57GB DRAM to load the full Alibaba-NLP/gte-Qwen2-7B-instruct
model.
We switched to vLLM, a high-throughput and memory-efficient inference and serving engine for LLMs.
With vLLM, we can run the model with standard Nvidia L4 GPU, which takes around 24G DRAM GPU for vllm. L4 is also dramatically cheaper than A100: On GCP, Nvidia L4 costs more than $0.7/hour while Nvidia A100 is at least $2.9/hour.
When comparing both vllm and sentence transformer on Nvidia A100 with 80GB DRAM, we saw up to 5.5x faster embedding generation using vLLM for the Qwen2 model, compared to sentence transformer.
For a corpus of 10,000 legal documents with 15,000 chunks, the processing times were:
- Standard sentence transformers: ~5.5 hours
- vLLM implementation: ~1 hour
The following is our SkyPilot YAMLs for running Alibaba-NLP/gte-Qwen2-7B-instruct
with vLLM for embedding generation:
SkyPilot YAML for embedding generation with vLLM:
name: compute-legal-embeddings
resources:
# this needs ~ 24G DRAM GPU for vllm, 57G with SentenceTransformer
accelerators: {L4:1, A100:1}
envs:
START_IDX: ${START_IDX}
END_IDX: ${END_IDX}
file_mounts:
/output:
name: my-bucket-for-embedding-output
mode: MOUNT
setup: |
pip install torch==2.5.1 vllm==0.6.6.post
...
envs:
MODEL_NAME: "Alibaba-NLP/gte-Qwen2-7B-instruct"
run: |
python -m vllm.entrypoints.openai.api_server \
--host 0.0.0.0 \
--model $MODEL_NAME \
--max-model-len 3072 \
--task embed &
python scripts/compute_embeddings.py \
--start-idx $START_IDX \
--end-idx $END_IDX \
--chunk-size 2048 \
--chunk-overlap 512 \
--vllm-endpoint http://localhost:8000
5. Don’t let embedding generation take 100+ hours — parallelize with SkyPilot
Generating embeddings for the legal documents with a single L4 GPU took far too long—up to 100 hours. Potentially we can use spot machines, which can cut down the embedding generation time and cost significantly:
Name | Original | 100 Spot Workers |
---|---|---|
Time | 100 Hours | 1 Hour (100x faster) |
Cost | $70.1 | $28.2 (2.4x saving) |
We thus scale up the embedding generation with SkyPilot Managed Jobs which support
- Massively parallel jobs: We split the data into partitions and launched multiple embedding jobs across multiple clouds.
- Use spot instances with automated recovery: If a spot instance or node fails, SkyPilot auto-retries, picking up right where it left off.
- Unified storage: All embeddings stream directly into a cloud bucket (in Apache Parquet format). No more losing work or dealing with partial outputs.
The following figure shows the parallel running 10+ parallel spot jobs for embedding generation and SkyPilot goes across different regions for finding the best GPU availability:
Result: We no longer wait days for embeddings. Our jobs scale dynamically, letting us move from prototype to production quickly—even with huge datasets.
Conclusion
Building a DeepSeek R1 RAG for legal documents taught us a few critical lessons:
- Leverage specialized embedding models (like Qwen2) for robust retrieval.
- Use R1’s reasoning power in the generation stage to tackle complex legal queries.
- (Prompt) engineering is still key to controlling citations and structuring content.
- Accelerate inference with vLLM for major efficiency and speed boosts.
- Scale AI on the cloud and your own infra with SkyPilot to keep compute times short and budgets manageable.
Want to learn more?
- Pipeline and instructions for building the RAG: GitHub Repo link
- Scale up AI workloads with SkyPilot: SkyPilot Docs
Note: This RAG system should be viewed as a powerful tool for legal research and analysis, not as a replacement for qualified legal professionals. The outputs should not be considered legal advice.