Building a GraphRAG for Legal Contracts with MongoDB Atlas

The analysis and interpretation of complex legal documents, such as financing contracts, represent a significant challenge even for the most experienced professionals. These documents contain numerous entities, clauses, regulatory references, and intricate relationships that require a deep understanding to correctly interpret.

The recent introduction of GraphRAG (Graph-based Retrieval Augmented Generation) by MongoDB Atlas offers an innovative approach to address this challenge. Combining the power of knowledge graphs with large language models (LLMs), GraphRAG enables the extraction, representation, and querying of complex relationships present in legal documents more effectively than traditional vector-based approaches.

In this article, we will explore how to implement a serverless system on AWS using CDK and TypeScript to extract and build a legal semantic map from a financing contract. We will see how GraphRAG can significantly improve the understanding and querying of these complex documents, providing more accurate and contextualized answers.

Understanding GraphRAG: Beyond Traditional RAG

Before implementing GraphRAG, it's essential to understand what distinguishes It from traditional RAG (Retrieval Augmented Generation) and why it's particularly well-suited for legal document analysis.

Limitations of Traditional RAG

Traditional RAG, based on vector embeddings, presents several significant limitations when it comes to analyzing legal documents:

Loss of Relationships: When documents are split into chunks for vectorization, relationships between different document parts are lost.
Difficulty with Multi-hop Reasoning: Vector-based RAG struggles to answer questions requiring connecting information in different parts of the document.
Limited Explainability: Vector embeddings make it challenging to understand why specific chunks were selected as relevant.

Advantages of GraphRAG for Legal Documents

GraphRAG addresses these limitations by structuring data as a knowledge graph instead of vector embeddings:

Relationship Preservation: Entities (such as contracting parties, clauses, obligations) and their relationships are explicitly modeled.
Multi-hop Reasoning: GraphRAG can navigate through the graph to answer complex questions that require connecting different entities.
Greater Explainability: The path through the graph provides a visual and understandable explanation of how the answer was reached.
Structural Understanding: GraphRAG excels in understanding the document's structure, hierarchy, and connections.

These characteristics make GraphRAG particularly suitable for analyzing financing contracts, where understanding the relationships between clauses, obligations, parties, and regulatory references is fundamental.

Analysis of the Financing Contract

The financing contract we will analyze is a € 100 million term loan credit facility agreement. This type of contract is particularly complex, containing numerous entities and relationships that lend themselves well to being represented as a knowledge graph.

You can download a facsimile of the contract here

LIDIA - test contract

LIDIA - test contract.pdf

3 MB

Main Entities in the Contract

From the analysis of the contract, we can identify several categories of entities:

Contract Parties:
1. Lender: [bank 1], [bank 2], [bank 3] (Mandated Lead Arrangers and Original Lenders)
2. Borrower: [company]
3. Agent: [agent] (also Security Agent)
Financial Elements:
1. Total Amount: EURO 100,000,000
2. Facilities: Refinancing Facility, Capex Facility, Incremental Facility
3. Interest Rates: Margin 4.25% per annum (with possible increase)
Contractual Clauses:
1. Definitions and Interpretations
2. Conditions of Use
3. Repayment and Cancellation
4. Interest and Fees
5. Guarantees and Indemnities
6. Financial Covenants
7. Events of Default
Regulatory References:
1. Italian Civil Code
2. Business Crisis and Insolvency Code (CIC)
3. Legislative Decree 231/2001 (Administrative Liability)
4. Legislative Decree 231/2007 (Anti-Money Laundering)

Key Relationships in the Contract

The relationships between these entities are equally important:

Relationships between Parties:
1. Lender → Borrower: Grants the financing
2. Agent → Lender: Represents the Lenders
3. Agent → Borrower: Monitors compliance with obligations
Financial Relationships:
1. Borrower → Facilities: Beneficiary of credit lines
2. Leverage → Margin: Influences the applicable interest rate
Obligation Relationships:
1. Borrower → Financial Covenants: Obligated to respect the covenants
2. Borrower → Information Undertakings: Obligated to provide information
Risk Relationships:
1. Event of Default → Facilities: Can cause acceleration of repayment
2. Change of Control → Event of Default: Can constitute an event of default

These entities and relationships form the basis of our legal semantic map.

System Architecture

To implement our solution, we will utilize a serverless architecture on AWS, with MongoDB Atlas serving as the database for the knowledge graph. Here is an overview of the architecture:

Main Components

API Gateway: Exposes RESTful endpoints for interaction with the system
Lambda Functions: Serverless functions to process requests and manage business logic
MongoDB Atlas: Database to store the knowledge graph and documents
Amazon S3: Storage for the original legal documents
OpenAI GPT-4.1 Mini: LLM for entity and relationship extraction

Processing Flow

Document Upload: The legal contract is uploaded to S3
Text Extraction: Text is extracted from the document
Semantic Analysis: GPT-4.1 Mini analyzes the text to identify entities and relationships
Graph Construction: Entities and relationships are stored in MongoDB as a graph
Querying: Queries are processed using GraphRAG to answer questions about the contract

Implementation with AWS CDK and TypeScript

Let's see how to implement this architecture using AWS CDK and TypeScript.

Infrastructure Configuration with AWS CDK

The first step is to define the AWS infrastructure using CDK:

// lib/legal-graph-rag-stack.ts
import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import * as s3 from 'aws-cdk-lib/aws-s3';
import * as iam from 'aws-cdk-lib/aws-iam';
import * as secretsmanager from 'aws-cdk-lib/aws-secretsmanager';

export class LegalGraphRagStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // S3 bucket for legal documents
    const documentsBucket = new s3.Bucket(this, 'LegalDocumentsBucket', {
      removalPolicy: cdk.RemovalPolicy.RETAIN,
      cors: [
        {
          allowedMethods: [
            s3.HttpMethods.GET,
            s3.HttpMethods.POST,
            s3.HttpMethods.PUT,
          ],
          allowedOrigins: ['*'],
          allowedHeaders: ['*'],
        },
      ],
    });

    // Secret for MongoDB Atlas
    const mongoDbSecret = new secretsmanager.Secret(this, 'MongoDBAtlasSecret', {
      secretName: 'mongodb-atlas-credentials',
      description: 'Credentials for MongoDB Atlas',
    });

    // Secret for OpenAI API
    const openAiSecret = new secretsmanager.Secret(this, 'OpenAISecret', {
      secretName: 'openai-api-key',
      description: 'API Key for OpenAI',
    });

    // Lambda for document upload and processing
    const documentProcessorLambda = new lambda.Function(this, 'DocumentProcessorFunction', {
      runtime: lambda.Runtime.NODEJS_18_X,
      handler: 'index.handler',
      code: lambda.Code.fromAsset('lambda/document-processor'),
      timeout: cdk.Duration.minutes(5),
      memorySize: 1024,
      environment: {
        DOCUMENTS_BUCKET: documentsBucket.bucketName,
        MONGODB_SECRET_ARN: mongoDbSecret.secretArn,
        OPENAI_SECRET_ARN: openAiSecret.secretArn,
      },
    });

    // Lambda for graph querying
    const graphQueryLambda = new lambda.Function(this, 'GraphQueryFunction', {
      runtime: lambda.Runtime.NODEJS_18_X,
      handler: 'index.handler',
      code: lambda.Code.fromAsset('lambda/graph-query'),
      timeout: cdk.Duration.minutes(1),
      memorySize: 512,
      environment: {
        MONGODB_SECRET_ARN: mongoDbSecret.secretArn,
        OPENAI_SECRET_ARN: openAiSecret.secretArn,
      },
    });

    // Permissions for Lambdas
    documentsBucket.grantReadWrite(documentProcessorLambda);
    mongoDbSecret.grantRead(documentProcessorLambda);
    mongoDbSecret.grantRead(graphQueryLambda);
    openAiSecret.grantRead(documentProcessorLambda);
    openAiSecret.grantRead(graphQueryLambda);

    // API Gateway
    const api = new apigateway.RestApi(this, 'LegalGraphRagApi', {
      restApiName: 'Legal Graph RAG Service',
      description: 'API for legal contract analysis with GraphRAG',
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
        allowMethods: apigateway.Cors.ALL_METHODS,
      },
    });

    // Endpoint for document upload
    const documentsResource = api.root.addResource('documents');
    documentsResource.addMethod('POST', new apigateway.LambdaIntegration(documentProcessorLambda));

    // Endpoint for graph queries
    const queriesResource = api.root.addResource('queries');
    queriesResource.addMethod('POST', new apigateway.LambdaIntegration(graphQueryLambda));

    // Output
    new cdk.CfnOutput(this, 'ApiEndpoint', {
      value: api.url,
      description: 'URL of the API Gateway',
    });

    new cdk.CfnOutput(this, 'DocumentsBucketName', {
      value: documentsBucket.bucketName,
      description: 'Name of the S3 bucket for documents',
    });
  }
}

This code defines the AWS infrastructure needed for our system, including an S3 bucket for documents, secrets for credentials, Lambda functions, and an API Gateway.

Entity and Relationship Extraction with GPT-4.1 Mini

The heart of our system is the Lambda function that processes documents and uses GPT-4.1 Mini to extract entities and relationships:

// Function to extract entities and relationships with OpenAI
async function extractEntitiesAndRelations(openai: OpenAI, documentContent: string): Promise<{ entities: any[], relations: any[] }> {
  // Prompt for entity extraction
  const entityExtractionPrompt = `
  Analyze the following financing contract and extract all relevant entities.
  For each entity, provide:
  1. A unique ID
  2. The entity type (Party, Clause, FinancialElement, RegulatoryReference)
  3. The name or identifier of the entity
  4. A brief description
  5. The original text from which it was extracted

  Contract:
  ${documentContent.substring(0, 8000)} // We limit to avoid exceeding the maximum context
  
  Output format (JSON):
  {
    "entities": [
      {
        "id": "string",
        "type": "string",
        "name": "string",
        "description": "string",
        "originalText": "string"
      }
    ]
  }
  `;

  // Prompt for relationship extraction
  const relationExtractionPrompt = `
  Analyze the following financing contract and extract all relationships between entities.
  For each relationship, provide:
  1. A unique ID
  2. The source entity ID
  3. The target entity ID
  4. The relationship type (Grants, Represents, Monitors, Defines, Obligates, ReferencesTo, Causes)
  5. A brief description of the relationship
  6. The original text from which it was extracted

  Use the following entities previously extracted:
  ${JSON.stringify(await openai.chat.completions.create({
    model: "gpt-4.1-mini",
    messages: [{ role: "user", content: entityExtractionPrompt }],
  }))}
  
  Contract:
  ${documentContent.substring(0, 8000)} // We limit to avoid exceeding the maximum context
  
  Output format (JSON):
  {
    "relations": [
      {
        "id": "string",
        "sourceEntityId": "string",
        "targetEntityId": "string",
        "type": "string",
        "description": "string",
        "originalText": "string"
      }
    ]
  }
  `;

  // Entity extraction
  const entityResponse = await openai.chat.completions.create({
    model: "gpt-4.1-mini",
    messages: [{ role: "user", content: entityExtractionPrompt }],
  });
  
  const entityData = JSON.parse(entityResponse.choices[0].message.content || '{"entities": []}');
  
  // Relationship extraction
  const relationResponse = await openai.chat.completions.create({
    model: "gpt-4.1-mini",
    messages: [{ role: "user", content: relationExtractionPrompt }],
  });
  
  const relationData = JSON.parse(relationResponse.choices[0].message.content || '{"relations": []}');
  
  return {
    entities: entityData.entities,
    relations: relationData.relations,
  };
}

This code uses GPT-4.1 Mini to extract entities and relationships from the contract text. The prompts are designed explicitly for the financing contract domain, encompassing relevant entity and relationship types.

Integration with MongoDB Atlas for GraphRAG

To store and query the knowledge graph, we use MongoDB Atlas:

// lib/mongodb-graph-store.ts
import { MongoClient, Collection, Db } from 'mongodb';

export interface Entity {
  _id?: string;
  id: string;
  type: string;
  name: string;
  description: string;
  originalText: string;
  documentId: string;
  createdAt: Date;
}

export interface Relation {
  _id?: string;
  id: string;
  sourceEntityId: string;
  targetEntityId: string;
  type: string;
  description: string;
  originalText: string;
  createdAt: Date;
}

export class MongoDBGraphStore {
  private client: MongoClient;
  private db: Db;
  private entitiesCollection: Collection<Entity>;
  private relationsCollection: Collection<Relation>;
  
  constructor(uri: string, dbName: string) {
    this.client = new MongoClient(uri);
    this.db = this.client.db(dbName);
    this.entitiesCollection = this.db.collection<Entity>('entities');
    this.relationsCollection = this.db.collection<Relation>('relations');
  }
  
  async connect(): Promise<void> {
    await this.client.connect();
  }
  
  async close(): Promise<void> {
    await this.client.close();
  }
  
  async traverseGraph(startEntityId: string, maxDepth: number = 3): Promise<any> {
    // We use $graphLookup to traverse the graph
    const result = await this.db.collection('relations').aggregate([
      {
        $match: {
          $or: [
            { sourceEntityId: startEntityId },
            { targetEntityId: startEntityId },
          ]
        }
      },
      {
        $graphLookup: {
          from: 'relations',
          startWith: '$sourceEntityId',
          connectFromField: 'targetEntityId',
          connectToField: 'sourceEntityId',
          as: 'connections',
          maxDepth,
        }
      },
      {
        $lookup: {
          from: 'entities',
          localField: 'sourceEntityId',
          foreignField: '_id',
          as: 'sourceEntity'
        }
      },
      {
        $lookup: {
          from: 'entities',
          localField: 'targetEntityId',
          foreignField: '_id',
          as: 'targetEntity'
        }
      },
      {
        $unwind: '$sourceEntity'
      },
      {
        $unwind: '$targetEntity'
      }
    ]).toArray();
    
    return result;
  }
}

The MongoDBGraphStore class provides methods for adding entities and relationships to the graph, as well as for traversing the graph using MongoDB's $graphLookup operator, which is fundamental for multi-hop reasoning.

Querying the Graph with GraphRAG

The Lambda function for graph querying uses GraphRAG to answer questions about the contract:

// Function to search in the graph
async function searchGraph(db: any, entities: string[], documentId?: string): Promise<any[]> {
  const entitiesCollection = db.collection('entities');
  const relationsCollection = db.collection('relations');
  
  // Search for entities mentioned in the query
  const query: any = {
    $or: entities.map(entity => ({
      $or: [
        { name: { $regex: entity, $options: 'i' } },
        { description: { $regex: entity, $options: 'i' } },
      ]
    }))
  };
  
  if (documentId) {
    query.documentId = documentId;
  }
  
  const foundEntities = await entitiesCollection.find(query).toArray();
  
  if (foundEntities.length === 0) {
    return [];
  }
  
  // Search for relationships for the found entities
  const entityIds = foundEntities.map(entity => entity._id.toString());
  
  const relations = await relationsCollection.find({
    $or: [
      { sourceEntityId: { $in: entityIds } },
      { targetEntityId: { $in: entityIds } },
    ]
  }).toArray();
  
  // Search for entities connected through relationships
  const relatedEntityIds = new Set<string>();
  
  relations.forEach(relation => {
    relatedEntityIds.add(relation.sourceEntityId.toString());
    relatedEntityIds.add(relation.targetEntityId.toString());
  });
  
  const allEntityIds = Array.from(relatedEntityIds);
  
  const allEntities = await entitiesCollection.find({
    _id: { $in: allEntityIds.map(id => new require('mongodb').ObjectId(id)) }
  }).toArray();
  
  // Graph construction
  const graph = {
    entities: allEntities,
    relations: relations,
  };
  
  return graph;
}

// Function to generate the answer
async function generateAnswer(openai: OpenAI, query: string, graphResults: any): Promise<string> {
  const prompt = `
  Answer the following question about a financing contract using only the information provided by the graph.
  If you cannot answer based on the available information, clearly indicate that you do not have sufficient information.

  Question: ${query}
  
  Information from the graph:
  ${JSON.stringify(graphResults, null, 2)}
  
  Provide a detailed and precise answer, citing specific parts of the contract when possible.
  `;

  const response = await openai.chat.completions.create({
    model: "gpt-4.1-mini",
    messages: [{ role: "user", content: prompt }],
  });
  
  return response.choices[0].message.content || 'It was not possible to generate an answer.';
}

This code searches for entities mentioned in the query, finds associated relationships, and uses GPT-4.1 Mini to generate an answer based on the graph.

Query and Answer Examples

Let's look at some examples of how our system can answer questions about the financing contract:

Example 1: Simple Query

Query: "What are the borrower's obligations in the contract?"

Answer:

Based on the information extracted from the graph, the borrower (Borrower) has the following main obligations:

1. Financial Obligations (Financial Covenants): The borrower is obligated to respect certain financial covenants, including maintaining a Leverage below 4.75 to avoid increasing the Margin by 50 basis points.

2. Information Obligations (Information Undertakings): The borrower must regularly provide financial information, including annual financial statements (Annual Financial Statements) and Compliance Certificates.

3. General Obligations (General Undertakings): These include maintaining necessary authorizations, compliance with applicable laws, and preservation of assets.

4. Repayment: The borrower is obligated to repay the financing according to the deadlines established in the Capex Facility Repayment Dates.

5. Interest Payment: The borrower must pay interest calculated based on the defined Margin (4.25% per annum, with possible increase).

These obligations are defined in the "Financial Covenants", "Information Undertakings", "General Undertakings", "Repayment", and "Interest" sections of the contract.

Example 2: Complex Query with Relationships

Query: "What are the consequences if the borrower does not respect the financial covenants?"

Answer:

Based on the information extracted from the graph, if the borrower does not respect the financial covenants, the following consequences occur:

1. Event of Default: Failure to comply with Financial Covenants constitutes an Event of Default as specified in clause 24.2 (Financial covenants and information undertakings).

2. Margin Increase: If the Leverage exceeds 4.75, the Margin for each Loan will increase by 50 basis points (Increased Margin), going from 4.25% to 4.75% per annum.

3. Repayment Acceleration: As an Event of Default, it can lead to acceleration of the financing repayment, as provided in the Acceleration clause.

4. Event of Termination: Failure to comply with financial covenants is classified as an Event of Termination, which can lead to termination of the contract.

5. Equity Cure Possibility: The contract provides an Equity Cure mechanism (clause 22.3) that allows the borrower to remedy the failure to comply with financial covenants through the injection of new capital.

These consequences are interconnected in the graph through relationships that link Financial Covenants to Event of Default, Event of Termination, and Acceleration clauses.

Example 3: Multi-hop Query

Query: "Which clauses of the contract reference the Business Crisis and Insolvency Code, and what are the legal consequences?"

Answer:

Based on the information extracted from the graph, several clauses of the contract reference the Business Crisis and Insolvency Code (CIC):

1. Definition of "CIC": The contract defines "CIC" as the "Business Crisis and Insolvency Code", as established by Legislative Decree no. 14 of January 12, 2019.

2. Clause 24.6 (Insolvency): This clause references the state of crisis under the CIC as an Event of Default.

3. Clause 24.7 (Insolvency proceedings): This clause considers the opening of insolvency proceedings under the CIC as an Event of Default.

4. Definition of "Insolvency Event": Includes references to various instruments for the composition of crisis or insolvency provided by the CIC.

The legal consequences of these references are:

1. Event of Acceleration: Insolvency events under the CIC are classified as Event of Acceleration, which cause the loss of the benefit of the term pursuant to Article 1186 of the Civil Code.

2. Event of Withdrawal: Insolvency events are also classified as Event of Withdrawal, which allow withdrawal from the contract.

3. Early Termination: The occurrence of these events can lead to early termination of the contract and the obligation of immediate repayment of the financing.

These consequences are connected through a chain of relationships in the graph that connects the CIC to insolvency clauses, Event of Default, Event of Acceleration, and Event of Withdrawal.

These examples demonstrate how GraphRAG can answer questions of increasing complexity, utilizing the graph structure to navigate between entities and their relationships.

Advantages of the GraphRAG Approach for Legal Contracts

Implementing GraphRAG for financing contract analysis offers numerous advantages over traditional methods:

Contextual Understanding: GraphRAG understands the relationships between different parts of the contract, providing more contextualized answers.
Multi-hop Reasoning: It can answer complex questions that require connecting information present in different parts of the contract.
Explainability: The path through the graph provides a visual explanation of how the answer was reached, increasing transparency.
Precision: Answers are more precise because they are based on explicit relationships between entities, rather than relying solely on semantic similarity.
Adaptability to the Legal Domain: The graph structure can be adapted to the specific ontology of the legal and financial domain.

Implementation Considerations

When implementing a GraphRAG system for legal contracts, it's essential to consider several aspects:

Extraction Quality: The quality of the graph heavily depends on the LLM's ability to correctly extract entities and relationships. It's advisable to use powerful models, such as GPT-4.1, and well-designed prompts.
Scalability: As the number of documents and the size of the graph increase, it's essential to optimize queries to maintain good performance.
Graph Updates: Contracts can be modified over time. It's necessary to implement mechanisms to update the graph when documents change.
Security and Privacy: Legal contracts contain sensitive information. It's essential to implement adequate security measures and access control.

Conclusions

In this article, we have explored how to utilize GraphRAG with MongoDB Atlas to create a semantic map of a financing contract. We have seen how this technology can significantly enhance the understanding and querying of complex legal documents, providing more accurate and contextually relevant answers.

The serverless implementation on AWS, utilizing CDK and TypeScript, provides a scalable and flexible solution suitable for law firms and corporate legal departments that require managing and analyzing large volumes of legal documents.

GraphRAG represents a significant step forward compared to traditional RAG for legal document analysis, thanks to its ability to preserve and navigate the complex relationships present in contracts. With the continuous evolution of this technology, we can expect further improvements in the automatic understanding of legal documents, with potential applications in various areas of law.

Building a GraphRAG for Legal Contracts with MongoDB Atlas

Understanding GraphRAG: Beyond Traditional RAG

Limitations of Traditional RAG

Advantages of GraphRAG for Legal Documents

Analysis of the Financing Contract

Main Entities in the Contract

Key Relationships in the Contract

System Architecture

Main Components

Processing Flow

Implementation with AWS CDK and TypeScript

Infrastructure Configuration with AWS CDK

Entity and Relationship Extraction with GPT-4.1 Mini

Integration with MongoDB Atlas for GraphRAG

Querying the Graph with GraphRAG

Query and Answer Examples

Example 1: Simple Query

Example 2: Complex Query with Relationships

Example 3: Multi-hop Query

Advantages of the GraphRAG Approach for Legal Contracts

Implementation Considerations

Conclusions

Resources and References

Member discussion

There Is No Magic In AI

Designing Proactive AI Agents

Test and Deployment

Semantic Search with AWS Bedrock

Commerce Layer Integration

Using Strapi as our Headless CMS layer