Skip to content

Data Models

Core Models

bookwyrm.Text

Bases: BaseModel

Base text model containing just text content.

text class-attribute instance-attribute

text: str = Field(..., description='The text content')

options: show_source: true show_bases: true inherited_members: true

bookwyrm.Span

Bases: BaseModel

Base span model with position information.

start_char class-attribute instance-attribute

start_char: int = Field(
    ..., description="Starting character position"
)

end_char class-attribute instance-attribute

end_char: int = Field(
    ..., description="Ending character position"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.TextSpan

Bases: Text, Span

Text content with character position information.

options: show_source: true show_bases: true inherited_members: true

bookwyrm.Citation

Bases: BaseModel

A citation found in response to a question.

Citations include the relevant text, reasoning for why it's relevant, and a quality score indicating how well it answers the question.

start_chunk class-attribute instance-attribute

start_chunk: int = Field(
    ..., description="Starting chunk index (inclusive)"
)

end_chunk class-attribute instance-attribute

end_chunk: int = Field(
    ..., description="Ending chunk index (inclusive)"
)

text class-attribute instance-attribute

text: str = Field(
    ..., description="The citation text content"
)

reasoning class-attribute instance-attribute

reasoning: str = Field(
    ...,
    description="Explanation of why this citation is relevant",
)

quality class-attribute instance-attribute

quality: int = Field(
    ...,
    description="Quality score (0-4): 0=unrelated, 4=perfectly answers",
)

question_index class-attribute instance-attribute

question_index: Optional[int] = Field(
    None,
    description="1-based index of the question this citation answers (only present for multi-question requests)",
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.UsageInfo

Bases: BaseModel

Usage and billing information for API requests.

Tracks token usage, processing statistics, and cost estimates.

tokens_processed class-attribute instance-attribute

tokens_processed: int = Field(
    ..., description="Total tokens processed in the request"
)

chunks_processed class-attribute instance-attribute

chunks_processed: int = Field(
    ..., description="Number of text chunks processed"
)

estimated_cost class-attribute instance-attribute

estimated_cost: Optional[float] = Field(
    None, description="Estimated cost in USD"
)

remaining_credits class-attribute instance-attribute

remaining_credits: Optional[float] = Field(
    None, description="Remaining account credits"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.FileClassification

Bases: BaseModel

Classification results for a file.

Contains detailed information about the file's format, content type, and confidence in the classification.

format_type class-attribute instance-attribute

format_type: str = Field(
    ...,
    description="General file format (e.g., 'text', 'image', 'binary', 'archive')",
)

content_type class-attribute instance-attribute

content_type: str = Field(
    ...,
    description="Specific content type (e.g., 'python_code', 'json_data', 'jpeg_image')",
)

mime_type class-attribute instance-attribute

mime_type: str = Field(
    ..., description="Detected MIME type"
)

confidence class-attribute instance-attribute

confidence: float = Field(
    ...,
    description="Classification confidence score (0.0-1.0)",
)

details class-attribute instance-attribute

details: dict = Field(
    ...,
    description="Additional classification details (encoding, language, etc.)",
)

classification_methods class-attribute instance-attribute

classification_methods: Optional[List[str]] = Field(
    None, description="Methods used for classification"
)

options: show_source: true show_bases: true inherited_members: true

Request Models

bookwyrm.CitationRequest

Bases: BaseModel

Request model for citation processing.

Use this model to request citations for a question from text chunks. Provide exactly one of: chunks, jsonl_content, or jsonl_url.

chunks class-attribute instance-attribute

chunks: Optional[List[TextSpan]] = Field(
    None, description="List of text chunks to search"
)

jsonl_content class-attribute instance-attribute

jsonl_content: Optional[str] = Field(
    None, description="Raw JSONL content as string"
)

jsonl_url class-attribute instance-attribute

jsonl_url: Optional[str] = Field(
    None, description="URL to fetch JSONL content from"
)

question class-attribute instance-attribute

question: Union[str, List[str]] = Field(
    ..., description="The question(s) to find citations for"
)

start class-attribute instance-attribute

start: Optional[int] = Field(
    0, description="Starting chunk index (0-based)"
)

limit class-attribute instance-attribute

limit: Optional[int] = Field(
    None, description="Maximum number of chunks to process"
)

max_tokens_per_chunk class-attribute instance-attribute

max_tokens_per_chunk: Optional[int] = Field(
    1000, description="Maximum tokens per chunk"
)

model_strength class-attribute instance-attribute

model_strength: ModelStrength = Field(
    SWIFT,
    description="Model strength level for processing quality vs speed trade-offs",
)

validate_input_source

validate_input_source()

Validate that exactly one input source is provided and question is not empty.

Source code in bookwyrm/models.py
@model_validator(mode="after")
def validate_input_source(self):
    """Validate that exactly one input source is provided and question is not empty."""
    sources = [self.chunks, self.jsonl_content, self.jsonl_url]
    provided_sources = [s for s in sources if s is not None]

    if len(provided_sources) != 1:
        raise ValueError(
            "Exactly one of 'chunks', 'jsonl_content', or 'jsonl_url' must be provided"
        )

    # Validate question(s)
    if isinstance(self.question, str):
        if not self.question or not self.question.strip():
            raise ValueError("question cannot be empty")
    elif isinstance(self.question, list):
        if not self.question:
            raise ValueError("question list cannot be empty")
        if len(self.question) > 20:
            raise ValueError("question list cannot contain more than 20 questions")
        for i, q in enumerate(self.question):
            if not q or not q.strip():
                raise ValueError(f"question at index {i} cannot be empty")
    else:
        raise ValueError("question must be a string or list of strings")

    if self.start is not None and self.start < 0:
        raise ValueError("start must be >= 0")

    if self.limit is not None and self.limit <= 0:
        raise ValueError("limit must be > 0")

    return self

options: show_source: true show_bases: true inherited_members: true

bookwyrm.SummarizeRequest

Bases: BaseModel

Request model for summarization processing.

content class-attribute instance-attribute

content: Optional[str] = None

url class-attribute instance-attribute

url: Optional[str] = None

phrases class-attribute instance-attribute

phrases: Optional[List[TextSpan]] = None

max_tokens class-attribute instance-attribute

max_tokens: int = 10000

debug class-attribute instance-attribute

debug: bool = False

model_strength class-attribute instance-attribute

model_strength: ModelStrength = SWIFT

model_name class-attribute instance-attribute

model_name: Optional[str] = None

model_schema_json class-attribute instance-attribute

model_schema_json: Optional[str] = None

summary_class class-attribute instance-attribute

summary_class: Optional[Type[BaseModel]] = Field(
    None, exclude=True
)

chunk_prompt class-attribute instance-attribute

chunk_prompt: Optional[str] = None

summary_of_summaries_prompt class-attribute instance-attribute

summary_of_summaries_prompt: Optional[str] = None

validate_input_source

validate_input_source()

Validate that exactly one input source is provided.

Source code in bookwyrm/models.py
@model_validator(mode="after")
def validate_input_source(self):
    """Validate that exactly one input source is provided."""
    sources = [self.content, self.url, self.phrases]
    provided_sources = [s for s in sources if s is not None]

    if len(provided_sources) != 1:
        raise ValueError(
            "Exactly one of 'content', 'url', or 'phrases' must be provided"
        )

    if self.max_tokens > 131072:
        raise ValueError(
            f"max_tokens cannot exceed 131,072 (got {self.max_tokens})"
        )
    if self.max_tokens < 1:
        raise ValueError(f"max_tokens must be at least 1 (got {self.max_tokens})")

    # Handle direct Pydantic model conversion
    if self.summary_class is not None:
        if self.model_name or self.model_schema_json:
            raise ValueError(
                "Cannot specify both 'summary_class' and 'model_name'/'model_schema_json'. Use either the direct class or the name/schema pair."
            )

        # Convert Pydantic class to name and schema
        self.model_name = self.summary_class.__name__
        self.model_schema_json = json.dumps(self.summary_class.model_json_schema())
        # Clear the summary_class since it's now converted and excluded from serialization
        self.summary_class = None

    # Structured output validation
    # Check if both pydantic model and custom prompts are specified
    has_pydantic_model = bool(
        self.model_name or self.model_schema_json or self.summary_class
    )
    has_custom_prompts = bool(self.chunk_prompt or self.summary_of_summaries_prompt)

    if has_pydantic_model and has_custom_prompts:
        raise ValueError(
            "Cannot specify both pydantic model options (summary_class/model_name/model_schema_json) and custom prompt options (chunk_prompt/summary_of_summaries_prompt). These are mutually exclusive."
        )

    # Validate pydantic model fields are complete
    if self.model_name and not self.model_schema_json:
        raise ValueError(
            "model_schema_json is required when model_name is provided"
        )
    if self.model_schema_json and not self.model_name:
        raise ValueError(
            "model_name is required when model_schema_json is provided"
        )

    # Validate custom prompts are complete
    if self.chunk_prompt and not self.summary_of_summaries_prompt:
        raise ValueError(
            "summary_of_summaries_prompt is required when chunk_prompt is provided"
        )
    if self.summary_of_summaries_prompt and not self.chunk_prompt:
        raise ValueError(
            "chunk_prompt is required when summary_of_summaries_prompt is provided"
        )

    return self

options: show_source: true show_bases: true inherited_members: true

bookwyrm.ProcessTextRequest

Bases: BaseModel

Request model for phrasal text processing.

Example usage with URL

request = ProcessTextRequest( text_url="https://www.gutenberg.org/cache/epub/32706/pg32706.txt", chunk_size=1000, response_format=ResponseFormat.WITH_OFFSETS )

text class-attribute instance-attribute

text: Optional[str] = None

text_url class-attribute instance-attribute

text_url: Optional[str] = None

chunk_size class-attribute instance-attribute

chunk_size: Optional[int] = None

response_format class-attribute instance-attribute

response_format: ResponseFormat = WITH_OFFSETS

validate_input_source

validate_input_source()

Validate that exactly one of text or text_url is provided.

Source code in bookwyrm/models.py
@model_validator(mode="after")
def validate_input_source(self):
    """Validate that exactly one of text or text_url is provided."""
    if not self.text and not self.text_url:
        raise ValueError("Either 'text' or 'text_url' must be provided")
    if self.text and self.text_url:
        raise ValueError("Only one of 'text' or 'text_url' should be provided")
    return self

options: show_source: true show_bases: true inherited_members: true

bookwyrm.ClassifyRequest

Bases: BaseModel

Request model for file classification.

content class-attribute instance-attribute

content: Optional[str] = None

content_bytes class-attribute instance-attribute

content_bytes: Optional[bytes] = None

filename class-attribute instance-attribute

filename: Optional[str] = None

content_encoding class-attribute instance-attribute

content_encoding: ContentEncoding = RAW

validate_input_source

validate_input_source()

Validate that exactly one of content or content_bytes is provided.

Source code in bookwyrm/models.py
@model_validator(mode="after")
def validate_input_source(self):
    """Validate that exactly one of content or content_bytes is provided."""
    sources = [self.content, self.content_bytes]
    provided_sources = [s for s in sources if s is not None]

    if len(provided_sources) != 1:
        raise ValueError(
            "Exactly one of 'content' or 'content_bytes' must be provided"
        )

    return self

options: show_source: true show_bases: true inherited_members: true

bookwyrm.PDFExtractRequest

Bases: BaseModel

Request model for PDF structure extraction.

pdf_url class-attribute instance-attribute

pdf_url: Optional[str] = None

pdf_content class-attribute instance-attribute

pdf_content: Optional[str] = None

pdf_bytes class-attribute instance-attribute

pdf_bytes: Optional[bytes] = None

filename class-attribute instance-attribute

filename: Optional[str] = None

start_page class-attribute instance-attribute

start_page: Optional[int] = None

num_pages class-attribute instance-attribute

num_pages: Optional[int] = None

lang class-attribute instance-attribute

lang: str = 'en'

enable_layout_detection class-attribute instance-attribute

enable_layout_detection: bool = False

force_ocr class-attribute instance-attribute

force_ocr: bool = False

validate_input_source

validate_input_source() -> PDFExtractRequest

Validate that exactly one of pdf_url, pdf_content, or pdf_bytes is provided.

Source code in bookwyrm/models.py
@model_validator(mode="after")
def validate_input_source(self) -> "PDFExtractRequest":
    """Validate that exactly one of pdf_url, pdf_content, or pdf_bytes is provided."""
    sources = [self.pdf_url, self.pdf_content, self.pdf_bytes]
    provided_sources = [s for s in sources if s is not None]

    if len(provided_sources) != 1:
        raise ValueError(
            "Exactly one of 'pdf_url', 'pdf_content', or 'pdf_bytes' must be provided"
        )

    if self.start_page is not None and self.start_page < 1:
        raise ValueError("start_page must be >= 1")

    if self.num_pages is not None and self.num_pages < 1:
        raise ValueError("num_pages must be >= 1")

    # Auto-enable force_ocr when layout detection is enabled
    if self.enable_layout_detection and not self.force_ocr:
        self.force_ocr = True

    return self

options: show_source: true show_bases: true inherited_members: true

Response Models

bookwyrm.CitationResponse

Bases: BaseModel

Response containing citation results and usage information.

This is the response from non-streaming citation requests.

citations class-attribute instance-attribute

citations: List[Citation] = Field(
    ..., description="List of found citations"
)

total_citations class-attribute instance-attribute

total_citations: int = Field(
    ..., description="Total number of citations found"
)

usage class-attribute instance-attribute

usage: Optional[UsageInfo] = Field(
    None, description="Usage and billing information"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.SummaryResponse

Bases: BaseModel

Response model for summarization results.

Contains the final summary and metadata about the summarization process.

type class-attribute instance-attribute

type: Literal["summary"] = Field(
    "summary", description="Message type identifier"
)

summary class-attribute instance-attribute

summary: str = Field(
    ...,
    description="The final summary text or structured JSON",
)

subsummary_count class-attribute instance-attribute

subsummary_count: int = Field(
    ...,
    description="Number of intermediate summaries created",
)

levels_used class-attribute instance-attribute

levels_used: int = Field(
    ..., description="Number of hierarchical levels used"
)

total_tokens class-attribute instance-attribute

total_tokens: int = Field(
    ..., description="Total tokens processed"
)

intermediate_summaries class-attribute instance-attribute

intermediate_summaries: Optional[List[List[str]]] = Field(
    None,
    description="Debug information with summaries by level",
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.ClassifyResponse

Bases: BaseModel

Response model for classification results.

Contains the classification results along with file metadata.

classification class-attribute instance-attribute

classification: FileClassification = Field(
    ..., description="The file classification results"
)

file_size class-attribute instance-attribute

file_size: int = Field(
    ..., description="Size of the file in bytes"
)

sample_preview class-attribute instance-attribute

sample_preview: Optional[str] = Field(
    None,
    description="First few characters if text-based file",
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.PDFExtractResponse

Bases: BaseModel

Response model for PDF extraction results.

Contains the extracted PDF data and processing metadata.

pages class-attribute instance-attribute

pages: List[PDFPage] = Field(
    ..., description="List of extracted page data"
)

total_pages class-attribute instance-attribute

total_pages: int = Field(
    ..., description="Total number of pages processed"
)

processing_time class-attribute instance-attribute

processing_time: Optional[float] = Field(
    None, description="Time taken for processing (seconds)"
)

options: show_source: true show_bases: true inherited_members: true

PDF Models

bookwyrm.PDFTextElement

Bases: BaseModel

Legacy text element model for backward compatibility.

text class-attribute instance-attribute

text: str = Field(
    ..., description="The extracted text content"
)

confidence class-attribute instance-attribute

confidence: float = Field(
    ..., description="OCR confidence score (0.0-1.0)"
)

bbox class-attribute instance-attribute

bbox: List[List[float]] = Field(
    ..., description="Raw bounding box polygon coordinates"
)

coordinates class-attribute instance-attribute

coordinates: PDFBoundingBox = Field(
    ..., description="Simplified rectangular bounding box"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.PDFPage

Bases: BaseModel

Data for a single PDF page with unified layout regions.

page_number class-attribute instance-attribute

page_number: int = Field(
    ..., description="The page number (1-based)"
)

layout_regions class-attribute instance-attribute

layout_regions: List[UnifiedLayoutRegion] = Field(
    default_factory=list,
    description="Unified list of all detected layout regions with typed content",
)

reading_order class-attribute instance-attribute

reading_order: Optional[List[int]] = Field(
    default=None,
    description="Global reading order indices for all content elements",
)

from_runpod_page_data classmethod

from_runpod_page_data(page_data: dict) -> PDFPage

Create PDFPage from runpod-pdf PageData format.

Source code in bookwyrm/models.py
@classmethod
def from_runpod_page_data(cls, page_data: dict) -> "PDFPage":
    """Create PDFPage from runpod-pdf PageData format."""
    return cls(**page_data)

get_text_content

get_text_content() -> List[TextContent]

Extract all text content from layout regions.

Source code in bookwyrm/models.py
def get_text_content(self) -> List[TextContent]:
    """Extract all text content from layout regions."""
    return [
        region.content
        for region in self.layout_regions
        if region.content.content_type == ContentType.TEXT
    ]

get_table_content

get_table_content() -> List[TableContent]

Extract all table content from layout regions.

Source code in bookwyrm/models.py
def get_table_content(self) -> List[TableContent]:
    """Extract all table content from layout regions."""
    return [
        region.content
        for region in self.layout_regions
        if region.content.content_type == ContentType.TABLE
    ]

get_image_content

get_image_content() -> List[ImageContent]

Extract all image content from layout regions.

Source code in bookwyrm/models.py
def get_image_content(self) -> List[ImageContent]:
    """Extract all image content from layout regions."""
    return [
        region.content
        for region in self.layout_regions
        if region.content.content_type == ContentType.IMAGE
    ]

get_formula_content

get_formula_content() -> List[FormulaContent]

Extract all formula content from layout regions.

Source code in bookwyrm/models.py
def get_formula_content(self) -> List[FormulaContent]:
    """Extract all formula content from layout regions."""
    return [
        region.content
        for region in self.layout_regions
        if region.content.content_type == ContentType.FORMULA
    ]

get_seal_content

get_seal_content() -> List[SealContent]

Extract all seal content from layout regions.

Source code in bookwyrm/models.py
def get_seal_content(self) -> List[SealContent]:
    """Extract all seal content from layout regions."""
    return [
        region.content
        for region in self.layout_regions
        if region.content.content_type == ContentType.SEAL
    ]

to_legacy_text_blocks

to_legacy_text_blocks() -> List[PDFTextElement]

Convert layout regions to legacy text blocks format for backward compatibility.

Source code in bookwyrm/models.py
def to_legacy_text_blocks(self) -> List[PDFTextElement]:
    """Convert layout regions to legacy text blocks format for backward compatibility."""
    legacy_blocks = []
    for region in self.layout_regions:
        if region.content.content_type == ContentType.TEXT:
            text_content = region.content
            legacy_block = PDFTextElement(
                text=text_content.text or "",
                confidence=text_content.confidence or 1.0,
                bbox=region.bbox,
                coordinates=region.coordinates,
            )
            legacy_blocks.append(legacy_block)
    return legacy_blocks

options: show_source: true show_bases: true inherited_members: true

bookwyrm.PDFStructuredData

Bases: BaseModel

Complete structured data from PDF extraction.

pages class-attribute instance-attribute

pages: List[PDFPage] = Field(
    ..., description="List of extracted page data"
)

total_pages class-attribute instance-attribute

total_pages: int = Field(
    ..., description="Total number of pages processed"
)

get_all_text_content

get_all_text_content() -> List[TextContent]

Get all text content from all pages.

Source code in bookwyrm/models.py
def get_all_text_content(self) -> List[TextContent]:
    """Get all text content from all pages."""
    all_text = []
    for page in self.pages:
        all_text.extend(page.get_text_content())
    return all_text

get_all_table_content

get_all_table_content() -> List[TableContent]

Get all table content from all pages.

Source code in bookwyrm/models.py
def get_all_table_content(self) -> List[TableContent]:
    """Get all table content from all pages."""
    all_tables = []
    for page in self.pages:
        all_tables.extend(page.get_table_content())
    return all_tables

options: show_source: true show_bases: true inherited_members: true

Streaming Response Models

bookwyrm.CitationProgressUpdate

Bases: BaseModel

Progress update during citation processing.

Sent during streaming citation requests to show processing progress.

type class-attribute instance-attribute

type: Literal["progress"] = Field(
    "progress", description="Message type identifier"
)

chunks_processed class-attribute instance-attribute

chunks_processed: int = Field(
    ..., description="Number of chunks processed so far"
)

total_chunks class-attribute instance-attribute

total_chunks: int = Field(
    ..., description="Total number of chunks to process"
)

citations_found class-attribute instance-attribute

citations_found: int = Field(
    ..., description="Number of citations found so far"
)

current_chunk_range class-attribute instance-attribute

current_chunk_range: str = Field(
    ...,
    description="Range of chunks currently being processed",
)

message class-attribute instance-attribute

message: str = Field(
    ..., description="Human-readable progress message"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.CitationStreamResponse

Bases: BaseModel

Individual citation found during streaming.

Sent when a citation is found during streaming citation requests.

type class-attribute instance-attribute

type: Literal["citation"] = Field(
    "citation", description="Message type identifier"
)

citation class-attribute instance-attribute

citation: Citation = Field(
    ..., description="The found citation"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.CitationSummaryResponse

Bases: BaseModel

Final summary of citation processing.

Sent at the end of streaming citation requests with final statistics.

type class-attribute instance-attribute

type: Literal["summary"] = Field(
    "summary", description="Message type identifier"
)

total_citations class-attribute instance-attribute

total_citations: int = Field(
    ..., description="Total number of citations found"
)

chunks_processed class-attribute instance-attribute

chunks_processed: int = Field(
    ..., description="Total number of chunks processed"
)

token_chunks_processed class-attribute instance-attribute

token_chunks_processed: int = Field(
    ..., description="Number of token chunks processed"
)

start_offset class-attribute instance-attribute

start_offset: int = Field(
    ..., description="Starting offset used for processing"
)

usage class-attribute instance-attribute

usage: UsageInfo = Field(
    ..., description="Usage and billing information"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.CitationErrorResponse

Bases: BaseModel

Error during citation processing.

Sent when an error occurs during streaming citation requests.

type class-attribute instance-attribute

type: Literal["error"] = Field(
    "error", description="Message type identifier"
)

error_message class-attribute instance-attribute

error_message: str = Field(
    ...,
    description="Error message describing what went wrong",
)

recoverable class-attribute instance-attribute

recoverable: bool = Field(
    True, description="Whether the error is recoverable"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.SummarizeProgressUpdate

Bases: BaseModel

Progress update during summarization processing.

Sent during streaming summarization to show hierarchical processing progress.

type class-attribute instance-attribute

type: Literal["progress"] = Field(
    "progress", description="Message type identifier"
)

current_level class-attribute instance-attribute

current_level: int = Field(
    ...,
    description="Current hierarchical level being processed",
)

total_levels class-attribute instance-attribute

total_levels: int = Field(
    ..., description="Total number of hierarchical levels"
)

chunks_processed class-attribute instance-attribute

chunks_processed: int = Field(
    ...,
    description="Number of chunks processed at current level",
)

total_chunks class-attribute instance-attribute

total_chunks: int = Field(
    ...,
    description="Total number of chunks at current level",
)

summaries_created class-attribute instance-attribute

summaries_created: int = Field(
    ..., description="Number of summaries created so far"
)

message class-attribute instance-attribute

message: str = Field(
    ..., description="Human-readable progress message"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.SummarizeErrorResponse

Bases: BaseModel

Error during summarization processing.

Sent when an error occurs during streaming summarization requests.

type class-attribute instance-attribute

type: Literal["error"] = Field(
    "error", description="Message type identifier"
)

error class-attribute instance-attribute

error: Optional[str] = Field(
    None,
    description="Error message describing what went wrong",
)

recoverable class-attribute instance-attribute

recoverable: bool = Field(
    True, description="Whether the error is recoverable"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.PhraseProgressUpdate

Bases: BaseModel

Progress update for phrasal processing.

Sent during streaming phrasal processing to show progress.

type class-attribute instance-attribute

type: Literal["progress"] = Field(
    "progress", description="Message type identifier"
)

phrases_processed class-attribute instance-attribute

phrases_processed: int = Field(
    ..., description="Number of phrases processed so far"
)

chunks_created class-attribute instance-attribute

chunks_created: int = Field(
    ..., description="Number of chunks created so far"
)

bytes_processed class-attribute instance-attribute

bytes_processed: int = Field(
    ..., description="Number of bytes processed"
)

message class-attribute instance-attribute

message: str = Field(
    ..., description="Human-readable progress message"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.TextResult

Bases: Text

A simple text result without position information.

Used when ResponseFormat.TEXT_ONLY is specified in phrasal processing.

type class-attribute instance-attribute

type: Literal["text"] = Field(
    "text", description="Message type identifier"
)

options: show_source: true show_bases: true inherited_members: true

bookwyrm.TextSpanResult

Bases: TextSpan

A text span result with position information.

Used when ResponseFormat.WITH_OFFSETS is specified in phrasal processing. Inherits from TextSpan to include position data.

type class-attribute instance-attribute

type: Literal["text_span"] = Field(
    "text_span", description="Message type identifier"
)

options: show_source: true show_bases: true inherited_members: true

Union Types

bookwyrm.StreamingCitationResponse module-attribute

StreamingCitationResponse = Union[
    CitationProgressUpdate,
    CitationStreamResponse,
    CitationSummaryResponse,
    CitationErrorResponse,
]

options: show_source: true show_bases: true inherited_members: true

bookwyrm.StreamingSummarizeResponse module-attribute

StreamingSummarizeResponse = Union[
    SummarizeProgressUpdate,
    SummaryResponse,
    SummarizeErrorResponse,
    RateLimitMessage,
    StructuralErrorMessage,
]

options: show_source: true show_bases: true inherited_members: true

bookwyrm.StreamingPhrasalResponse module-attribute

StreamingPhrasalResponse = Union[
    PhraseProgressUpdate, TextResult, TextSpanResult
]

options: show_source: true show_bases: true inherited_members: true

Enums

bookwyrm.ResponseFormat

Bases: str, Enum

Response format options for phrasal processing.

Determines whether position information is included in phrasal responses.

TEXT_ONLY class-attribute instance-attribute

TEXT_ONLY = 'text_only'

WITH_OFFSETS class-attribute instance-attribute

WITH_OFFSETS = 'with_offsets'

options: show_source: true show_bases: true inherited_members: true