Ask Document Tool - Interactive chat with document content using LLMs.
This tool allows users to have a conversational interface with various document types (PDF, TXT, TEX, etc.) by leveraging large language models. It maintains conversation context and provides streaming responses for a smooth experience.
- class ml_research_tools.doc.ask_document_tool.DocumentParser[source]#
Bases:
object
Base class for document content extractors.
- class ml_research_tools.doc.ask_document_tool.TextDocumentParser[source]#
Bases:
DocumentParser
Parser for plain text files (txt, md, etc.).
- class ml_research_tools.doc.ask_document_tool.CodeDocumentParser[source]#
Bases:
TextDocumentParser
Parser for code.
- class ml_research_tools.doc.ask_document_tool.LatexDocumentParser[source]#
Bases:
TextDocumentParser
Parser for LaTeX files.
- class ml_research_tools.doc.ask_document_tool.PDFDocumentParser[source]#
Bases:
DocumentParser
Parser for PDF files.
- class ml_research_tools.doc.ask_document_tool.URLParser[source]#
Bases:
DocumentParser
Parser for URLs (web pages and downloadable files).
- ml_research_tools.doc.ask_document_tool.get_parser_for_document(file_path)[source]#
Get the appropriate parser for the given document.
- Return type:
- Parameters:
file_path (str)
- ml_research_tools.doc.ask_document_tool.generate_document_cache_key(document_path, prefix='document')[source]#
Generate a cache key for a document.
- ml_research_tools.doc.ask_document_tool.load_document_with_cache(document_path, parser, redis_cache)[source]#
Load document content with caching support.
- Parameters:
document_path (
str
) – Path or URL to the documentparser (
DocumentParser
) – Document parser instanceredis_cache (RedisCache)
- Return type:
- Returns:
The document content
- ml_research_tools.doc.ask_document_tool.estimate_token_count_with_cache(text, redis_cache, model='gpt-3.5-turbo')[source]#
Estimate the number of tokens in the given text with caching support.
- Parameters:
text (
str
) – The text to estimate tokens formodel (
str
) – The model name to use for token countingredis_cache (RedisCache)
- Return type:
- Returns:
Estimated token count
- class ml_research_tools.doc.ask_document_tool.DocumentChat(document_path, llm_client, verbose=False, max_context_messages=20, redis_cache=None)[source]#
Bases:
object
Interactive chat with document content.
Initialize the document chat.
- Parameters:
document_path (
str
) – Path to the document fileconfig – Application configuration
verbose (
bool
) – Enable verbose outputmax_context_messages (
int
) – Maximum number of messages to keep in contextredis_cache (
Optional
[RedisCache
]) – Optional Redis cache instancellm_preset – Optional LLM preset to use
llm_tier – Optional LLM tier to use
llm_client (LLMClient)
- __init__(document_path, llm_client, verbose=False, max_context_messages=20, redis_cache=None)[source]#
Initialize the document chat.
- Parameters:
document_path (
str
) – Path to the document fileconfig – Application configuration
verbose (
bool
) – Enable verbose outputmax_context_messages (
int
) – Maximum number of messages to keep in contextredis_cache (
Optional
[RedisCache
]) – Optional Redis cache instancellm_preset – Optional LLM preset to use
llm_tier – Optional LLM tier to use
llm_client (LLMClient)
- class ml_research_tools.doc.ask_document_tool.AskDocumentTool(services)[source]#
Bases:
BaseTool
Tool for interactive chat with document content.
Initialize the tool with default values.
- Parameters:
services (ServiceProvider)
- classmethod add_arguments(parser)[source]#
Add tool-specific arguments to the argument parser.
- Return type:
- Parameters:
parser (ArgumentParser)