Data Models¶

Core Models¶

bookwyrm.Text ¶

Bases: BaseModel

Base text model containing just text content.

text `class-attribute` `instance-attribute` ¶

text: str = Field(..., description='The text content')

options: show_source: true show_bases: true inherited_members: true

bookwyrm.Span ¶

Bases: BaseModel

Base span model with position information.

start_char `class-attribute` `instance-attribute` ¶

start_char: int = Field(
    ..., description="Starting character position"
)

end_char `class-attribute` `instance-attribute` ¶

end_char: int = Field(
    ..., description="Ending character position"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.TextSpan ¶

Bases: Text, Span

Text content with character position information.

options: show_source: true show_bases: true inherited_members: true

bookwyrm.Citation ¶

Bases: BaseModel

A citation found in response to a question.

Citations include the relevant text, reasoning for why it's relevant, and a quality score indicating how well it answers the question.

start_chunk `class-attribute` `instance-attribute` ¶

start_chunk: int = Field(
    ..., description="Starting chunk index (inclusive)"
)

end_chunk `class-attribute` `instance-attribute` ¶

end_chunk: int = Field(
    ..., description="Ending chunk index (inclusive)"
)

text `class-attribute` `instance-attribute` ¶

text: str = Field(
    ..., description="The citation text content"
)

reasoning `class-attribute` `instance-attribute` ¶

reasoning: str = Field(
    ...,
    description="Explanation of why this citation is relevant",
)

quality `class-attribute` `instance-attribute` ¶

quality: int = Field(
    ...,
    description="Quality score (0-4): 0=unrelated, 4=perfectly answers",
)

question_index `class-attribute` `instance-attribute` ¶

question_index: Optional[int] = Field(
    None,
    description="1-based index of the question this citation answers (only present for multi-question requests)",
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.UsageInfo ¶

Bases: BaseModel

Usage and billing information for API requests.

Tracks token usage, processing statistics, and cost estimates.

tokens_processed `class-attribute` `instance-attribute` ¶

tokens_processed: int = Field(
    ..., description="Total tokens processed in the request"
)

chunks_processed `class-attribute` `instance-attribute` ¶

chunks_processed: int = Field(
    ..., description="Number of text chunks processed"
)

estimated_cost `class-attribute` `instance-attribute` ¶

estimated_cost: Optional[float] = Field(
    None, description="Estimated cost in USD"
)

remaining_credits `class-attribute` `instance-attribute` ¶

remaining_credits: Optional[float] = Field(
    None, description="Remaining account credits"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.FileClassification ¶

Bases: BaseModel

Classification results for a file.

Contains detailed information about the file's format, content type, and confidence in the classification.

format_type `class-attribute` `instance-attribute` ¶

format_type: str = Field(
    ...,
    description="General file format (e.g., 'text', 'image', 'binary', 'archive')",
)

content_type `class-attribute` `instance-attribute` ¶

content_type: str = Field(
    ...,
    description="Specific content type (e.g., 'python_code', 'json_data', 'jpeg_image')",
)

mime_type `class-attribute` `instance-attribute` ¶

mime_type: str = Field(
    ..., description="Detected MIME type"
)

confidence `class-attribute` `instance-attribute` ¶

confidence: float = Field(
    ...,
    description="Classification confidence score (0.0-1.0)",
)

details `class-attribute` `instance-attribute` ¶

details: dict = Field(
    ...,
    description="Additional classification details (encoding, language, etc.)",
)

classification_methods `class-attribute` `instance-attribute` ¶

classification_methods: Optional[List[str]] = Field(
    None, description="Methods used for classification"
)

options: show_source: true show_bases: true inherited_members: true

Request Models¶

bookwyrm.CitationRequest ¶

Bases: BaseModel

Request model for citation processing.

Use this model to request citations for a question from text chunks. Provide exactly one of: chunks, jsonl_content, or jsonl_url.

chunks `class-attribute` `instance-attribute` ¶

chunks: Optional[List[TextSpan]] = Field(
    None, description="List of text chunks to search"
)

jsonl_content `class-attribute` `instance-attribute` ¶

jsonl_content: Optional[str] = Field(
    None, description="Raw JSONL content as string"
)

jsonl_url `class-attribute` `instance-attribute` ¶

jsonl_url: Optional[str] = Field(
    None, description="URL to fetch JSONL content from"
)

question `class-attribute` `instance-attribute` ¶

question: Union[str, List[str]] = Field(
    ..., description="The question(s) to find citations for"
)

start `class-attribute` `instance-attribute` ¶

start: Optional[int] = Field(
    0, description="Starting chunk index (0-based)"
)

limit `class-attribute` `instance-attribute` ¶

limit: Optional[int] = Field(
    None, description="Maximum number of chunks to process"
)

max_tokens_per_chunk `class-attribute` `instance-attribute` ¶

max_tokens_per_chunk: Optional[int] = Field(
    1000, description="Maximum tokens per chunk"
)

model_strength `class-attribute` `instance-attribute` ¶

model_strength: ModelStrength = Field(
    SWIFT,
    description="Model strength level for processing quality vs speed trade-offs",
)

validate_input_source ¶

validate_input_source()

Validate that exactly one input source is provided and question is not empty.

Source code in bookwyrm/models.py

@model_validator(mode="after")
def validate_input_source(self):
    """Validate that exactly one input source is provided and question is not empty."""
    sources = [self.chunks, self.jsonl_content, self.jsonl_url]
    provided_sources = [s for s in sources if s is not None]

    if len(provided_sources) != 1:
        raise ValueError(
            "Exactly one of 'chunks', 'jsonl_content', or 'jsonl_url' must be provided"
        )

    # Validate question(s)
    if isinstance(self.question, str):
        if not self.question or not self.question.strip():
            raise ValueError("question cannot be empty")
    elif isinstance(self.question, list):
        if not self.question:
            raise ValueError("question list cannot be empty")
        if len(self.question) > 20:
            raise ValueError("question list cannot contain more than 20 questions")
        for i, q in enumerate(self.question):
            if not q or not q.strip():
                raise ValueError(f"question at index {i} cannot be empty")
    else:
        raise ValueError("question must be a string or list of strings")

    if self.start is not None and self.start < 0:
        raise ValueError("start must be >= 0")

    if self.limit is not None and self.limit <= 0:
        raise ValueError("limit must be > 0")

    return self

options: show_source: true show_bases: true inherited_members: true

bookwyrm.SummarizeRequest ¶

Bases: BaseModel

Request model for summarization processing.

content `class-attribute` `instance-attribute` ¶

content: Optional[str] = None

url `class-attribute` `instance-attribute` ¶

url: Optional[str] = None

phrases `class-attribute` `instance-attribute` ¶

phrases: Optional[List[TextSpan]] = None

max_tokens `class-attribute` `instance-attribute` ¶

max_tokens: int = 10000

debug `class-attribute` `instance-attribute` ¶

debug: bool = False

model_strength `class-attribute` `instance-attribute` ¶

model_strength: ModelStrength = SWIFT

model_name `class-attribute` `instance-attribute` ¶

model_name: Optional[str] = None

model_schema_json `class-attribute` `instance-attribute` ¶

model_schema_json: Optional[str] = None

summary_class `class-attribute` `instance-attribute` ¶

summary_class: Optional[Type[BaseModel]] = Field(
    None, exclude=True
)

chunk_prompt `class-attribute` `instance-attribute` ¶

chunk_prompt: Optional[str] = None

summary_of_summaries_prompt `class-attribute` `instance-attribute` ¶

summary_of_summaries_prompt: Optional[str] = None

validate_input_source ¶

validate_input_source()

Validate that exactly one input source is provided.

Source code in bookwyrm/models.py

@model_validator(mode="after")
def validate_input_source(self):
    """Validate that exactly one input source is provided."""
    sources = [self.content, self.url, self.phrases]
    provided_sources = [s for s in sources if s is not None]

    if len(provided_sources) != 1:
        raise ValueError(
            "Exactly one of 'content', 'url', or 'phrases' must be provided"
        )

    if self.max_tokens > 131072:
        raise ValueError(
            f"max_tokens cannot exceed 131,072 (got {self.max_tokens})"
        )
    if self.max_tokens < 1:
        raise ValueError(f"max_tokens must be at least 1 (got {self.max_tokens})")

    # Handle direct Pydantic model conversion
    if self.summary_class is not None:
        if self.model_name or self.model_schema_json:
            raise ValueError(
                "Cannot specify both 'summary_class' and 'model_name'/'model_schema_json'. Use either the direct class or the name/schema pair."
            )

        # Convert Pydantic class to name and schema
        self.model_name = self.summary_class.__name__
        self.model_schema_json = json.dumps(self.summary_class.model_json_schema())
        # Clear the summary_class since it's now converted and excluded from serialization
        self.summary_class = None

    # Structured output validation
    # Check if both pydantic model and custom prompts are specified
    has_pydantic_model = bool(
        self.model_name or self.model_schema_json or self.summary_class
    )
    has_custom_prompts = bool(self.chunk_prompt or self.summary_of_summaries_prompt)

    if has_pydantic_model and has_custom_prompts:
        raise ValueError(
            "Cannot specify both pydantic model options (summary_class/model_name/model_schema_json) and custom prompt options (chunk_prompt/summary_of_summaries_prompt). These are mutually exclusive."
        )

    # Validate pydantic model fields are complete
    if self.model_name and not self.model_schema_json:
        raise ValueError(
            "model_schema_json is required when model_name is provided"
        )
    if self.model_schema_json and not self.model_name:
        raise ValueError(
            "model_name is required when model_schema_json is provided"
        )

    # Validate custom prompts are complete
    if self.chunk_prompt and not self.summary_of_summaries_prompt:
        raise ValueError(
            "summary_of_summaries_prompt is required when chunk_prompt is provided"
        )
    if self.summary_of_summaries_prompt and not self.chunk_prompt:
        raise ValueError(
            "chunk_prompt is required when summary_of_summaries_prompt is provided"
        )

    return self

options: show_source: true show_bases: true inherited_members: true

bookwyrm.ProcessTextRequest ¶

Bases: BaseModel

Request model for phrasal text processing.

Example usage with URL

request = ProcessTextRequest( text_url="https://www.gutenberg.org/cache/epub/32706/pg32706.txt", chunk_size=1000, response_format=ResponseFormat.WITH_OFFSETS )

text `class-attribute` `instance-attribute` ¶

text: Optional[str] = None

text_url `class-attribute` `instance-attribute` ¶

text_url: Optional[str] = None

chunk_size `class-attribute` `instance-attribute` ¶

chunk_size: Optional[int] = None

response_format `class-attribute` `instance-attribute` ¶

response_format: ResponseFormat = WITH_OFFSETS

validate_input_source ¶

validate_input_source()

Validate that exactly one of text or text_url is provided.

Source code in bookwyrm/models.py

@model_validator(mode="after")
def validate_input_source(self):
    """Validate that exactly one of text or text_url is provided."""
    if not self.text and not self.text_url:
        raise ValueError("Either 'text' or 'text_url' must be provided")
    if self.text and self.text_url:
        raise ValueError("Only one of 'text' or 'text_url' should be provided")
    return self

options: show_source: true show_bases: true inherited_members: true

bookwyrm.ClassifyRequest ¶

Bases: BaseModel

Request model for file classification.

content `class-attribute` `instance-attribute` ¶

content: Optional[str] = None

content_bytes `class-attribute` `instance-attribute` ¶

content_bytes: Optional[bytes] = None

filename `class-attribute` `instance-attribute` ¶

filename: Optional[str] = None

content_encoding `class-attribute` `instance-attribute` ¶

content_encoding: ContentEncoding = RAW

validate_input_source ¶

validate_input_source()

Validate that exactly one of content or content_bytes is provided.

Source code in bookwyrm/models.py

@model_validator(mode="after")
def validate_input_source(self):
    """Validate that exactly one of content or content_bytes is provided."""
    sources = [self.content, self.content_bytes]
    provided_sources = [s for s in sources if s is not None]

    if len(provided_sources) != 1:
        raise ValueError(
            "Exactly one of 'content' or 'content_bytes' must be provided"
        )

    return self

options: show_source: true show_bases: true inherited_members: true

bookwyrm.PDFExtractRequest ¶

Bases: BaseModel

Request model for PDF structure extraction.

pdf_url `class-attribute` `instance-attribute` ¶

pdf_url: Optional[str] = None

pdf_content `class-attribute` `instance-attribute` ¶

pdf_content: Optional[str] = None

pdf_bytes `class-attribute` `instance-attribute` ¶

pdf_bytes: Optional[bytes] = None

filename `class-attribute` `instance-attribute` ¶

filename: Optional[str] = None

start_page `class-attribute` `instance-attribute` ¶

start_page: Optional[int] = None

num_pages `class-attribute` `instance-attribute` ¶

num_pages: Optional[int] = None

lang `class-attribute` `instance-attribute` ¶

lang: str = 'en'

enable_layout_detection `class-attribute` `instance-attribute` ¶

enable_layout_detection: bool = False

force_ocr `class-attribute` `instance-attribute` ¶

force_ocr: bool = False

validate_input_source ¶

validate_input_source() -> PDFExtractRequest

Validate that exactly one of pdf_url, pdf_content, or pdf_bytes is provided.

Source code in bookwyrm/models.py

@model_validator(mode="after")
def validate_input_source(self) -> "PDFExtractRequest":
    """Validate that exactly one of pdf_url, pdf_content, or pdf_bytes is provided."""
    sources = [self.pdf_url, self.pdf_content, self.pdf_bytes]
    provided_sources = [s for s in sources if s is not None]

    if len(provided_sources) != 1:
        raise ValueError(
            "Exactly one of 'pdf_url', 'pdf_content', or 'pdf_bytes' must be provided"
        )

    if self.start_page is not None and self.start_page < 1:
        raise ValueError("start_page must be >= 1")

    if self.num_pages is not None and self.num_pages < 1:
        raise ValueError("num_pages must be >= 1")

    # Auto-enable force_ocr when layout detection is enabled
    if self.enable_layout_detection and not self.force_ocr:
        self.force_ocr = True

    return self

options: show_source: true show_bases: true inherited_members: true

Response Models¶

bookwyrm.CitationResponse ¶

Bases: BaseModel

Response containing citation results and usage information.

This is the response from non-streaming citation requests.

citations `class-attribute` `instance-attribute` ¶

citations: List[Citation] = Field(
    ..., description="List of found citations"
)

total_citations `class-attribute` `instance-attribute` ¶

total_citations: int = Field(
    ..., description="Total number of citations found"
)

usage `class-attribute` `instance-attribute` ¶

usage: Optional[UsageInfo] = Field(
    None, description="Usage and billing information"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.SummaryResponse ¶

Bases: BaseModel

Response model for summarization results.

Contains the final summary and metadata about the summarization process.

type `class-attribute` `instance-attribute` ¶

type: Literal["summary"] = Field(
    "summary", description="Message type identifier"
)

summary `class-attribute` `instance-attribute` ¶

summary: str = Field(
    ...,
    description="The final summary text or structured JSON",
)

subsummary_count `class-attribute` `instance-attribute` ¶

subsummary_count: int = Field(
    ...,
    description="Number of intermediate summaries created",
)

levels_used `class-attribute` `instance-attribute` ¶

levels_used: int = Field(
    ..., description="Number of hierarchical levels used"
)

total_tokens `class-attribute` `instance-attribute` ¶

total_tokens: int = Field(
    ..., description="Total tokens processed"
)

intermediate_summaries `class-attribute` `instance-attribute` ¶

intermediate_summaries: Optional[List[List[str]]] = Field(
    None,
    description="Debug information with summaries by level",
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.ClassifyResponse ¶

Bases: BaseModel

Response model for classification results.

Contains the classification results along with file metadata.

classification `class-attribute` `instance-attribute` ¶

classification: FileClassification = Field(
    ..., description="The file classification results"
)

file_size `class-attribute` `instance-attribute` ¶

file_size: int = Field(
    ..., description="Size of the file in bytes"
)

sample_preview `class-attribute` `instance-attribute` ¶

sample_preview: Optional[str] = Field(
    None,
    description="First few characters if text-based file",
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.PDFExtractResponse ¶

Bases: BaseModel

Response model for PDF extraction results.

Contains the extracted PDF data and processing metadata.

pages `class-attribute` `instance-attribute` ¶

pages: List[PDFPage] = Field(
    ..., description="List of extracted page data"
)

total_pages `class-attribute` `instance-attribute` ¶

total_pages: int = Field(
    ..., description="Total number of pages processed"
)

processing_time `class-attribute` `instance-attribute` ¶

processing_time: Optional[float] = Field(
    None, description="Time taken for processing (seconds)"
)

options: show_source: true show_bases: true inherited_members: true

PDF Models¶

bookwyrm.PDFTextElement ¶

Bases: BaseModel

Legacy text element model for backward compatibility.

text `class-attribute` `instance-attribute` ¶

text: str = Field(
    ..., description="The extracted text content"
)

confidence `class-attribute` `instance-attribute` ¶

confidence: float = Field(
    ..., description="OCR confidence score (0.0-1.0)"
)

bbox `class-attribute` `instance-attribute` ¶

bbox: List[List[float]] = Field(
    ..., description="Raw bounding box polygon coordinates"
)

coordinates `class-attribute` `instance-attribute` ¶

coordinates: PDFBoundingBox = Field(
    ..., description="Simplified rectangular bounding box"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.PDFPage ¶

Bases: BaseModel

Data for a single PDF page with unified layout regions.

page_number `class-attribute` `instance-attribute` ¶

page_number: int = Field(
    ..., description="The page number (1-based)"
)

layout_regions `class-attribute` `instance-attribute` ¶

layout_regions: List[UnifiedLayoutRegion] = Field(
    default_factory=list,
    description="Unified list of all detected layout regions with typed content",
)

reading_order `class-attribute` `instance-attribute` ¶

reading_order: Optional[List[int]] = Field(
    default=None,
    description="Global reading order indices for all content elements",
)

from_runpod_page_data `classmethod` ¶

from_runpod_page_data(page_data: dict) -> PDFPage

Create PDFPage from runpod-pdf PageData format.

Source code in bookwyrm/models.py

@classmethod
def from_runpod_page_data(cls, page_data: dict) -> "PDFPage":
    """Create PDFPage from runpod-pdf PageData format."""
    return cls(**page_data)

get_text_content ¶

get_text_content() -> List[TextContent]

Extract all text content from layout regions.

Source code in bookwyrm/models.py

def get_text_content(self) -> List[TextContent]:
    """Extract all text content from layout regions."""
    return [
        region.content
        for region in self.layout_regions
        if region.content.content_type == ContentType.TEXT
    ]

get_table_content ¶

get_table_content() -> List[TableContent]

Extract all table content from layout regions.

Source code in bookwyrm/models.py

def get_table_content(self) -> List[TableContent]:
    """Extract all table content from layout regions."""
    return [
        region.content
        for region in self.layout_regions
        if region.content.content_type == ContentType.TABLE
    ]

get_image_content ¶

get_image_content() -> List[ImageContent]

Extract all image content from layout regions.

Source code in bookwyrm/models.py

def get_image_content(self) -> List[ImageContent]:
    """Extract all image content from layout regions."""
    return [
        region.content
        for region in self.layout_regions
        if region.content.content_type == ContentType.IMAGE
    ]

get_formula_content ¶

get_formula_content() -> List[FormulaContent]

Extract all formula content from layout regions.

Source code in bookwyrm/models.py

def get_formula_content(self) -> List[FormulaContent]:
    """Extract all formula content from layout regions."""
    return [
        region.content
        for region in self.layout_regions
        if region.content.content_type == ContentType.FORMULA
    ]

get_seal_content ¶

get_seal_content() -> List[SealContent]

Extract all seal content from layout regions.

Source code in bookwyrm/models.py

def get_seal_content(self) -> List[SealContent]:
    """Extract all seal content from layout regions."""
    return [
        region.content
        for region in self.layout_regions
        if region.content.content_type == ContentType.SEAL
    ]

to_legacy_text_blocks ¶

to_legacy_text_blocks() -> List[PDFTextElement]

Convert layout regions to legacy text blocks format for backward compatibility.

Source code in bookwyrm/models.py

def to_legacy_text_blocks(self) -> List[PDFTextElement]:
    """Convert layout regions to legacy text blocks format for backward compatibility."""
    legacy_blocks = []
    for region in self.layout_regions:
        if region.content.content_type == ContentType.TEXT:
            text_content = region.content
            legacy_block = PDFTextElement(
                text=text_content.text or "",
                confidence=text_content.confidence or 1.0,
                bbox=region.bbox,
                coordinates=region.coordinates,
            )
            legacy_blocks.append(legacy_block)
    return legacy_blocks

options: show_source: true show_bases: true inherited_members: true

bookwyrm.PDFStructuredData ¶

Bases: BaseModel

Complete structured data from PDF extraction.

pages `class-attribute` `instance-attribute` ¶

pages: List[PDFPage] = Field(
    ..., description="List of extracted page data"
)

total_pages `class-attribute` `instance-attribute` ¶

total_pages: int = Field(
    ..., description="Total number of pages processed"
)

get_all_text_content ¶

get_all_text_content() -> List[TextContent]

Get all text content from all pages.

Source code in bookwyrm/models.py

def get_all_text_content(self) -> List[TextContent]:
    """Get all text content from all pages."""
    all_text = []
    for page in self.pages:
        all_text.extend(page.get_text_content())
    return all_text

get_all_table_content ¶

get_all_table_content() -> List[TableContent]

Get all table content from all pages.

Source code in bookwyrm/models.py

def get_all_table_content(self) -> List[TableContent]:
    """Get all table content from all pages."""
    all_tables = []
    for page in self.pages:
        all_tables.extend(page.get_table_content())
    return all_tables

options: show_source: true show_bases: true inherited_members: true

Streaming Response Models¶

bookwyrm.CitationProgressUpdate ¶

Bases: BaseModel

Progress update during citation processing.

Sent during streaming citation requests to show processing progress.

type `class-attribute` `instance-attribute` ¶

type: Literal["progress"] = Field(
    "progress", description="Message type identifier"
)

chunks_processed `class-attribute` `instance-attribute` ¶

chunks_processed: int = Field(
    ..., description="Number of chunks processed so far"
)

total_chunks `class-attribute` `instance-attribute` ¶

total_chunks: int = Field(
    ..., description="Total number of chunks to process"
)

citations_found `class-attribute` `instance-attribute` ¶

citations_found: int = Field(
    ..., description="Number of citations found so far"
)

current_chunk_range `class-attribute` `instance-attribute` ¶

current_chunk_range: str = Field(
    ...,
    description="Range of chunks currently being processed",
)

message `class-attribute` `instance-attribute` ¶

message: str = Field(
    ..., description="Human-readable progress message"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.CitationStreamResponse ¶

Bases: BaseModel

Individual citation found during streaming.

Sent when a citation is found during streaming citation requests.

type `class-attribute` `instance-attribute` ¶

type: Literal["citation"] = Field(
    "citation", description="Message type identifier"
)

citation `class-attribute` `instance-attribute` ¶

citation: Citation = Field(
    ..., description="The found citation"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.CitationSummaryResponse ¶

Bases: BaseModel

Final summary of citation processing.

Sent at the end of streaming citation requests with final statistics.

type `class-attribute` `instance-attribute` ¶

type: Literal["summary"] = Field(
    "summary", description="Message type identifier"
)

total_citations `class-attribute` `instance-attribute` ¶

total_citations: int = Field(
    ..., description="Total number of citations found"
)

chunks_processed `class-attribute` `instance-attribute` ¶

chunks_processed: int = Field(
    ..., description="Total number of chunks processed"
)

token_chunks_processed `class-attribute` `instance-attribute` ¶

token_chunks_processed: int = Field(
    ..., description="Number of token chunks processed"
)

start_offset `class-attribute` `instance-attribute` ¶

start_offset: int = Field(
    ..., description="Starting offset used for processing"
)

usage `class-attribute` `instance-attribute` ¶

usage: UsageInfo = Field(
    ..., description="Usage and billing information"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.CitationErrorResponse ¶

Bases: BaseModel

Error during citation processing.

Sent when an error occurs during streaming citation requests.

type `class-attribute` `instance-attribute` ¶

type: Literal["error"] = Field(
    "error", description="Message type identifier"
)

error_message `class-attribute` `instance-attribute` ¶

error_message: str = Field(
    ...,
    description="Error message describing what went wrong",
)

recoverable `class-attribute` `instance-attribute` ¶

recoverable: bool = Field(
    True, description="Whether the error is recoverable"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.SummarizeProgressUpdate ¶

Bases: BaseModel

Progress update during summarization processing.

Sent during streaming summarization to show hierarchical processing progress.

type `class-attribute` `instance-attribute` ¶

type: Literal["progress"] = Field(
    "progress", description="Message type identifier"
)

current_level `class-attribute` `instance-attribute` ¶

current_level: int = Field(
    ...,
    description="Current hierarchical level being processed",
)

total_levels `class-attribute` `instance-attribute` ¶

total_levels: int = Field(
    ..., description="Total number of hierarchical levels"
)

chunks_processed `class-attribute` `instance-attribute` ¶

chunks_processed: int = Field(
    ...,
    description="Number of chunks processed at current level",
)

total_chunks `class-attribute` `instance-attribute` ¶

total_chunks: int = Field(
    ...,
    description="Total number of chunks at current level",
)

summaries_created `class-attribute` `instance-attribute` ¶

summaries_created: int = Field(
    ..., description="Number of summaries created so far"
)

message `class-attribute` `instance-attribute` ¶

message: str = Field(
    ..., description="Human-readable progress message"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.SummarizeErrorResponse ¶

Bases: BaseModel

Error during summarization processing.

Sent when an error occurs during streaming summarization requests.

type `class-attribute` `instance-attribute` ¶

type: Literal["error"] = Field(
    "error", description="Message type identifier"
)

error `class-attribute` `instance-attribute` ¶

error: Optional[str] = Field(
    None,
    description="Error message describing what went wrong",
)

recoverable `class-attribute` `instance-attribute` ¶

recoverable: bool = Field(
    True, description="Whether the error is recoverable"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.PhraseProgressUpdate ¶

Bases: BaseModel

Progress update for phrasal processing.

Sent during streaming phrasal processing to show progress.

type `class-attribute` `instance-attribute` ¶

type: Literal["progress"] = Field(
    "progress", description="Message type identifier"
)

phrases_processed `class-attribute` `instance-attribute` ¶

phrases_processed: int = Field(
    ..., description="Number of phrases processed so far"
)

chunks_created `class-attribute` `instance-attribute` ¶

chunks_created: int = Field(
    ..., description="Number of chunks created so far"
)

bytes_processed `class-attribute` `instance-attribute` ¶

bytes_processed: int = Field(
    ..., description="Number of bytes processed"
)

message `class-attribute` `instance-attribute` ¶

message: str = Field(
    ..., description="Human-readable progress message"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.TextResult ¶

Bases: Text

A simple text result without position information.

Used when ResponseFormat.TEXT_ONLY is specified in phrasal processing.

type `class-attribute` `instance-attribute` ¶

type: Literal["text"] = Field(
    "text", description="Message type identifier"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.TextSpanResult ¶

Bases: TextSpan

A text span result with position information.

Used when ResponseFormat.WITH_OFFSETS is specified in phrasal processing. Inherits from TextSpan to include position data.

type `class-attribute` `instance-attribute` ¶

type: Literal["text_span"] = Field(
    "text_span", description="Message type identifier"
)

options: show_source: true show_bases: true inherited_members: true

Union Types¶

bookwyrm.StreamingCitationResponse `module-attribute` ¶

StreamingCitationResponse = Union[
    CitationProgressUpdate,
    CitationStreamResponse,
    CitationSummaryResponse,
    CitationErrorResponse,
]

options: show_source: true show_bases: true inherited_members: true

bookwyrm.StreamingSummarizeResponse `module-attribute` ¶

StreamingSummarizeResponse = Union[
    SummarizeProgressUpdate,
    SummaryResponse,
    SummarizeErrorResponse,
    RateLimitMessage,
    StructuralErrorMessage,
]

options: show_source: true show_bases: true inherited_members: true

bookwyrm.StreamingPhrasalResponse `module-attribute` ¶

StreamingPhrasalResponse = Union[
    PhraseProgressUpdate, TextResult, TextSpanResult
]

options: show_source: true show_bases: true inherited_members: true

Enums¶

bookwyrm.ResponseFormat ¶

Bases: str, Enum

Response format options for phrasal processing.

Determines whether position information is included in phrasal responses.

TEXT_ONLY `class-attribute` `instance-attribute` ¶

TEXT_ONLY = 'text_only'

WITH_OFFSETS `class-attribute` `instance-attribute` ¶

WITH_OFFSETS = 'with_offsets'

options: show_source: true show_bases: true inherited_members: true

Data Models¶

Core Models¶

bookwyrm.Text ¶

text class-attribute instance-attribute ¶

bookwyrm.Span ¶

start_char class-attribute instance-attribute ¶

end_char class-attribute instance-attribute ¶

bookwyrm.TextSpan ¶

bookwyrm.Citation ¶

start_chunk class-attribute instance-attribute ¶

end_chunk class-attribute instance-attribute ¶

text class-attribute instance-attribute ¶

reasoning class-attribute instance-attribute ¶

quality class-attribute instance-attribute ¶

question_index class-attribute instance-attribute ¶

bookwyrm.UsageInfo ¶

tokens_processed class-attribute instance-attribute ¶

chunks_processed class-attribute instance-attribute ¶

estimated_cost class-attribute instance-attribute ¶

remaining_credits class-attribute instance-attribute ¶

bookwyrm.FileClassification ¶

format_type class-attribute instance-attribute ¶

content_type class-attribute instance-attribute ¶

mime_type class-attribute instance-attribute ¶

confidence class-attribute instance-attribute ¶

details class-attribute instance-attribute ¶

classification_methods class-attribute instance-attribute ¶

Request Models¶

bookwyrm.CitationRequest ¶

chunks class-attribute instance-attribute ¶

jsonl_content class-attribute instance-attribute ¶

jsonl_url class-attribute instance-attribute ¶

question class-attribute instance-attribute ¶

start class-attribute instance-attribute ¶

limit class-attribute instance-attribute ¶

max_tokens_per_chunk class-attribute instance-attribute ¶

model_strength class-attribute instance-attribute ¶

validate_input_source ¶

bookwyrm.SummarizeRequest ¶

content class-attribute instance-attribute ¶

url class-attribute instance-attribute ¶

phrases class-attribute instance-attribute ¶

max_tokens class-attribute instance-attribute ¶

debug class-attribute instance-attribute ¶

model_strength class-attribute instance-attribute ¶

model_name class-attribute instance-attribute ¶

model_schema_json class-attribute instance-attribute ¶

summary_class class-attribute instance-attribute ¶

chunk_prompt class-attribute instance-attribute ¶

summary_of_summaries_prompt class-attribute instance-attribute ¶

validate_input_source ¶

bookwyrm.ProcessTextRequest ¶

text class-attribute instance-attribute ¶

text_url class-attribute instance-attribute ¶

chunk_size class-attribute instance-attribute ¶

response_format class-attribute instance-attribute ¶

validate_input_source ¶

bookwyrm.ClassifyRequest ¶

content class-attribute instance-attribute ¶

content_bytes class-attribute instance-attribute ¶

filename class-attribute instance-attribute ¶

content_encoding class-attribute instance-attribute ¶

validate_input_source ¶

bookwyrm.PDFExtractRequest ¶

pdf_url class-attribute instance-attribute ¶

pdf_content class-attribute instance-attribute ¶

pdf_bytes class-attribute instance-attribute ¶

filename class-attribute instance-attribute ¶

start_page class-attribute instance-attribute ¶

num_pages class-attribute instance-attribute ¶

lang class-attribute instance-attribute ¶

enable_layout_detection class-attribute instance-attribute ¶

force_ocr class-attribute instance-attribute ¶

validate_input_source ¶

Response Models¶

bookwyrm.CitationResponse ¶

citations class-attribute instance-attribute ¶

total_citations class-attribute instance-attribute ¶

usage class-attribute instance-attribute ¶

bookwyrm.SummaryResponse ¶

text `class-attribute` `instance-attribute` ¶

start_char `class-attribute` `instance-attribute` ¶

end_char `class-attribute` `instance-attribute` ¶

start_chunk `class-attribute` `instance-attribute` ¶

end_chunk `class-attribute` `instance-attribute` ¶

text `class-attribute` `instance-attribute` ¶

reasoning `class-attribute` `instance-attribute` ¶

quality `class-attribute` `instance-attribute` ¶

question_index `class-attribute` `instance-attribute` ¶

tokens_processed `class-attribute` `instance-attribute` ¶

chunks_processed `class-attribute` `instance-attribute` ¶

estimated_cost `class-attribute` `instance-attribute` ¶

remaining_credits `class-attribute` `instance-attribute` ¶

format_type `class-attribute` `instance-attribute` ¶

content_type `class-attribute` `instance-attribute` ¶

mime_type `class-attribute` `instance-attribute` ¶

confidence `class-attribute` `instance-attribute` ¶

details `class-attribute` `instance-attribute` ¶

classification_methods `class-attribute` `instance-attribute` ¶

chunks `class-attribute` `instance-attribute` ¶

jsonl_content `class-attribute` `instance-attribute` ¶

jsonl_url `class-attribute` `instance-attribute` ¶

question `class-attribute` `instance-attribute` ¶

start `class-attribute` `instance-attribute` ¶

limit `class-attribute` `instance-attribute` ¶

max_tokens_per_chunk `class-attribute` `instance-attribute` ¶

model_strength `class-attribute` `instance-attribute` ¶

content `class-attribute` `instance-attribute` ¶

url `class-attribute` `instance-attribute` ¶

phrases `class-attribute` `instance-attribute` ¶

max_tokens `class-attribute` `instance-attribute` ¶

debug `class-attribute` `instance-attribute` ¶

model_strength `class-attribute` `instance-attribute` ¶

model_name `class-attribute` `instance-attribute` ¶

model_schema_json `class-attribute` `instance-attribute` ¶

summary_class `class-attribute` `instance-attribute` ¶

chunk_prompt `class-attribute` `instance-attribute` ¶

summary_of_summaries_prompt `class-attribute` `instance-attribute` ¶

text `class-attribute` `instance-attribute` ¶

text_url `class-attribute` `instance-attribute` ¶

chunk_size `class-attribute` `instance-attribute` ¶

response_format `class-attribute` `instance-attribute` ¶

content `class-attribute` `instance-attribute` ¶

content_bytes `class-attribute` `instance-attribute` ¶

filename `class-attribute` `instance-attribute` ¶

content_encoding `class-attribute` `instance-attribute` ¶

pdf_url `class-attribute` `instance-attribute` ¶

pdf_content `class-attribute` `instance-attribute` ¶

pdf_bytes `class-attribute` `instance-attribute` ¶

filename `class-attribute` `instance-attribute` ¶

start_page `class-attribute` `instance-attribute` ¶

num_pages `class-attribute` `instance-attribute` ¶

lang `class-attribute` `instance-attribute` ¶

enable_layout_detection `class-attribute` `instance-attribute` ¶

force_ocr `class-attribute` `instance-attribute` ¶

citations `class-attribute` `instance-attribute` ¶

total_citations `class-attribute` `instance-attribute` ¶

usage `class-attribute` `instance-attribute` ¶

type `class-attribute` `instance-attribute` ¶

summary `class-attribute` `instance-attribute` ¶

subsummary_count `class-attribute` `instance-attribute` ¶

levels_used `class-attribute` `instance-attribute` ¶

total_tokens `class-attribute` `instance-attribute` ¶

intermediate_summaries `class-attribute` `instance-attribute` ¶

classification `class-attribute` `instance-attribute` ¶

file_size `class-attribute` `instance-attribute` ¶

sample_preview `class-attribute` `instance-attribute` ¶

pages `class-attribute` `instance-attribute` ¶

total_pages `class-attribute` `instance-attribute` ¶

processing_time `class-attribute` `instance-attribute` ¶

text `class-attribute` `instance-attribute` ¶

confidence `class-attribute` `instance-attribute` ¶

bbox `class-attribute` `instance-attribute` ¶

coordinates `class-attribute` `instance-attribute` ¶

page_number `class-attribute` `instance-attribute` ¶

layout_regions `class-attribute` `instance-attribute` ¶

reading_order `class-attribute` `instance-attribute` ¶

from_runpod_page_data `classmethod` ¶

pages `class-attribute` `instance-attribute` ¶

total_pages `class-attribute` `instance-attribute` ¶