Ask Document Tool - Interactive chat with document content using LLMs.

This tool allows users to have a conversational interface with various document types (PDF, TXT, TEX, etc.) by leveraging large language models. It maintains conversation context and provides streaming responses for a smooth experience.

class ml_research_tools.doc.ask_document_tool.DocumentParser[source]#

Bases: object

Base class for document content extractors.

classmethod can_handle(file_path)[source]#

Check if this parser can handle the given file type.

Return type:

bool

Parameters:

file_path (str)

classmethod extract_content(file_path)[source]#

Extract text content from the document.

Return type:

str

Parameters:

file_path (str)

classmethod should_cache()[source]#

Determine if this parser’s results should be cached.

By default, local document parsers don’t need caching as they’re typically fast to access. URL parsers should override this.

Return type:

bool

class ml_research_tools.doc.ask_document_tool.TextDocumentParser[source]#

Bases: DocumentParser

Parser for plain text files (txt, md, etc.).

classmethod can_handle(file_path)[source]#

Check if this parser can handle the given file type.

Return type:

bool

Parameters:

file_path (str)

classmethod extract_content(file_path)[source]#

Extract text content from text files.

Return type:

str

Parameters:

file_path (str)

class ml_research_tools.doc.ask_document_tool.CodeDocumentParser[source]#

Bases: TextDocumentParser

Parser for code.

classmethod can_handle(file_path)[source]#

Check if this parser can handle the given file type.

Return type:

bool

Parameters:

file_path (str)

classmethod extract_content(file_path)[source]#

Extract text content from text files.

Return type:

str

Parameters:

file_path (str)

class ml_research_tools.doc.ask_document_tool.LatexDocumentParser[source]#

Bases: TextDocumentParser

Parser for LaTeX files.

classmethod can_handle(file_path)[source]#

Check if this parser can handle the given file type.

Return type:

bool

Parameters:

file_path (str)

class ml_research_tools.doc.ask_document_tool.PDFDocumentParser[source]#

Bases: DocumentParser

Parser for PDF files.

classmethod can_handle(file_path)[source]#

Check if this parser can handle the given file type.

Return type:

bool

Parameters:

file_path (str)

classmethod extract_content(file_path)[source]#

Extract text content from PDF files.

Return type:

str

Parameters:

file_path (str)

class ml_research_tools.doc.ask_document_tool.URLParser[source]#

Bases: DocumentParser

Parser for URLs (web pages and downloadable files).

classmethod can_handle(file_path)[source]#

Check if this parser can handle the given URL.

Return type:

bool

Parameters:

file_path (str)

classmethod extract_content(url)[source]#

Extract content from a URL (webpage or downloadable file).

Return type:

str

Parameters:

url (str)

classmethod should_cache()[source]#

URLs should be cached as they’re slow to fetch and might change over time.

Return type:

bool

ml_research_tools.doc.ask_document_tool.get_parser_for_document(file_path)[source]#

Get the appropriate parser for the given document.

Return type:

Optional[DocumentParser]

Parameters:

file_path (str)

ml_research_tools.doc.ask_document_tool.generate_document_cache_key(document_path, prefix='document')[source]#

Generate a cache key for a document.

Parameters:
  • document_path (str) – Path or URL to the document

  • prefix (str) – Cache key prefix

Return type:

str

Returns:

A unique cache key for the document

ml_research_tools.doc.ask_document_tool.load_document_with_cache(document_path, parser, redis_cache)[source]#

Load document content with caching support.

Parameters:
Return type:

str

Returns:

The document content

ml_research_tools.doc.ask_document_tool.estimate_token_count_with_cache(text, redis_cache, model='gpt-3.5-turbo')[source]#

Estimate the number of tokens in the given text with caching support.

Parameters:
  • text (str) – The text to estimate tokens for

  • model (str) – The model name to use for token counting

  • redis_cache (RedisCache)

Return type:

int

Returns:

Estimated token count

class ml_research_tools.doc.ask_document_tool.DocumentChat(document_path, llm_client, verbose=False, max_context_messages=20, redis_cache=None)[source]#

Bases: object

Interactive chat with document content.

Initialize the document chat.

Parameters:
  • document_path (str) – Path to the document file

  • config – Application configuration

  • verbose (bool) – Enable verbose output

  • max_context_messages (int) – Maximum number of messages to keep in context

  • redis_cache (Optional[RedisCache]) – Optional Redis cache instance

  • llm_preset – Optional LLM preset to use

  • llm_tier – Optional LLM tier to use

  • llm_client (LLMClient)

__init__(document_path, llm_client, verbose=False, max_context_messages=20, redis_cache=None)[source]#

Initialize the document chat.

Parameters:
  • document_path (str) – Path to the document file

  • config – Application configuration

  • verbose (bool) – Enable verbose output

  • max_context_messages (int) – Maximum number of messages to keep in context

  • redis_cache (Optional[RedisCache]) – Optional Redis cache instance

  • llm_preset – Optional LLM preset to use

  • llm_tier – Optional LLM tier to use

  • llm_client (LLMClient)

add_user_message(content)[source]#

Add a user message to the conversation.

Return type:

None

Parameters:

content (str)

add_assistant_message(content)[source]#

Add an assistant message to the conversation.

Return type:

None

Parameters:

content (str)

stream_llm_response()[source]#

Stream the LLM response and return the complete response.

Return type:

str

run_interactive_chat()[source]#

Run the interactive chat session.

Return type:

None

class ml_research_tools.doc.ask_document_tool.AskDocumentTool(services)[source]#

Bases: BaseTool

Tool for interactive chat with document content.

Initialize the tool with default values.

Parameters:

services (ServiceProvider)

name: str = 'ask-document'#
description: str = 'Interactive chat with document content using LLMs'#
classmethod add_arguments(parser)[source]#

Add tool-specific arguments to the argument parser.

Return type:

None

Parameters:

parser (ArgumentParser)

execute(config, args)[source]#

Execute the tool with the provided arguments.

Return type:

int

Parameters: