Data Models¶
Core Models¶
bookwyrm.Text ¶
Bases: BaseModel
Base text model containing just text content.
options: show_source: true show_bases: true inherited_members: true
bookwyrm.Span ¶
options: show_source: true show_bases: true inherited_members: true
options: show_source: true show_bases: true inherited_members: true
bookwyrm.Citation ¶
Bases: BaseModel
A citation found in response to a question.
Citations include the relevant text, reasoning for why it's relevant, and a quality score indicating how well it answers the question.
start_chunk
class-attribute
instance-attribute
¶
end_chunk
class-attribute
instance-attribute
¶
text
class-attribute
instance-attribute
¶
reasoning
class-attribute
instance-attribute
¶
quality
class-attribute
instance-attribute
¶
question_index
class-attribute
instance-attribute
¶
question_index: Optional[int] = Field(
None,
description="1-based index of the question this citation answers (only present for multi-question requests)",
)
options: show_source: true show_bases: true inherited_members: true
bookwyrm.UsageInfo ¶
Bases: BaseModel
Usage and billing information for API requests.
Tracks token usage, processing statistics, and cost estimates.
options: show_source: true show_bases: true inherited_members: true
bookwyrm.FileClassification ¶
Bases: BaseModel
Classification results for a file.
Contains detailed information about the file's format, content type, and confidence in the classification.
format_type
class-attribute
instance-attribute
¶
format_type: str = Field(
...,
description="General file format (e.g., 'text', 'image', 'binary', 'archive')",
)
content_type
class-attribute
instance-attribute
¶
content_type: str = Field(
...,
description="Specific content type (e.g., 'python_code', 'json_data', 'jpeg_image')",
)
mime_type
class-attribute
instance-attribute
¶
confidence
class-attribute
instance-attribute
¶
details
class-attribute
instance-attribute
¶
details: dict = Field(
...,
description="Additional classification details (encoding, language, etc.)",
)
classification_methods
class-attribute
instance-attribute
¶
classification_methods: Optional[List[str]] = Field(
None, description="Methods used for classification"
)
options: show_source: true show_bases: true inherited_members: true
Request Models¶
bookwyrm.CitationRequest ¶
Bases: BaseModel
Request model for citation processing.
Use this model to request citations for a question from text chunks. Provide exactly one of: chunks, jsonl_content, or jsonl_url.
chunks
class-attribute
instance-attribute
¶
jsonl_content
class-attribute
instance-attribute
¶
jsonl_url
class-attribute
instance-attribute
¶
question
class-attribute
instance-attribute
¶
start
class-attribute
instance-attribute
¶
limit
class-attribute
instance-attribute
¶
max_tokens_per_chunk
class-attribute
instance-attribute
¶
model_strength
class-attribute
instance-attribute
¶
model_strength: ModelStrength = Field(
SWIFT,
description="Model strength level for processing quality vs speed trade-offs",
)
validate_input_source ¶
Validate that exactly one input source is provided and question is not empty.
Source code in bookwyrm/models.py
options: show_source: true show_bases: true inherited_members: true
bookwyrm.SummarizeRequest ¶
Bases: BaseModel
Request model for summarization processing.
summary_class
class-attribute
instance-attribute
¶
summary_of_summaries_prompt
class-attribute
instance-attribute
¶
validate_input_source ¶
Validate that exactly one input source is provided.
Source code in bookwyrm/models.py
options: show_source: true show_bases: true inherited_members: true
bookwyrm.ProcessTextRequest ¶
Bases: BaseModel
Request model for phrasal text processing.
Example usage with URL
request = ProcessTextRequest( text_url="https://www.gutenberg.org/cache/epub/32706/pg32706.txt", chunk_size=1000, response_format=ResponseFormat.WITH_OFFSETS )
validate_input_source ¶
Validate that exactly one of text or text_url is provided.
Source code in bookwyrm/models.py
options: show_source: true show_bases: true inherited_members: true
bookwyrm.ClassifyRequest ¶
Bases: BaseModel
Request model for file classification.
validate_input_source ¶
Validate that exactly one of content or content_bytes is provided.
Source code in bookwyrm/models.py
options: show_source: true show_bases: true inherited_members: true
bookwyrm.PDFExtractRequest ¶
Bases: BaseModel
Request model for PDF structure extraction.
validate_input_source ¶
Validate that exactly one of pdf_url, pdf_content, or pdf_bytes is provided.
Source code in bookwyrm/models.py
options: show_source: true show_bases: true inherited_members: true
Response Models¶
bookwyrm.CitationResponse ¶
options: show_source: true show_bases: true inherited_members: true
bookwyrm.SummaryResponse ¶
Bases: BaseModel
Response model for summarization results.
Contains the final summary and metadata about the summarization process.
type
class-attribute
instance-attribute
¶
summary
class-attribute
instance-attribute
¶
subsummary_count
class-attribute
instance-attribute
¶
levels_used
class-attribute
instance-attribute
¶
total_tokens
class-attribute
instance-attribute
¶
intermediate_summaries
class-attribute
instance-attribute
¶
intermediate_summaries: Optional[List[List[str]]] = Field(
None,
description="Debug information with summaries by level",
)
options: show_source: true show_bases: true inherited_members: true
bookwyrm.ClassifyResponse ¶
Bases: BaseModel
Response model for classification results.
Contains the classification results along with file metadata.
options: show_source: true show_bases: true inherited_members: true
bookwyrm.PDFExtractResponse ¶
options: show_source: true show_bases: true inherited_members: true
PDF Models¶
bookwyrm.PDFTextElement ¶
options: show_source: true show_bases: true inherited_members: true
bookwyrm.PDFPage ¶
Bases: BaseModel
Data for a single PDF page with unified layout regions.
page_number
class-attribute
instance-attribute
¶
layout_regions
class-attribute
instance-attribute
¶
layout_regions: List[UnifiedLayoutRegion] = Field(
default_factory=list,
description="Unified list of all detected layout regions with typed content",
)
reading_order
class-attribute
instance-attribute
¶
reading_order: Optional[List[int]] = Field(
default=None,
description="Global reading order indices for all content elements",
)
from_runpod_page_data
classmethod
¶
get_text_content ¶
Extract all text content from layout regions.
get_table_content ¶
Extract all table content from layout regions.
get_image_content ¶
Extract all image content from layout regions.
get_formula_content ¶
Extract all formula content from layout regions.
get_seal_content ¶
Extract all seal content from layout regions.
to_legacy_text_blocks ¶
Convert layout regions to legacy text blocks format for backward compatibility.
Source code in bookwyrm/models.py
options: show_source: true show_bases: true inherited_members: true
bookwyrm.PDFStructuredData ¶
options: show_source: true show_bases: true inherited_members: true
Streaming Response Models¶
bookwyrm.CitationProgressUpdate ¶
Bases: BaseModel
Progress update during citation processing.
Sent during streaming citation requests to show processing progress.
type
class-attribute
instance-attribute
¶
chunks_processed
class-attribute
instance-attribute
¶
total_chunks
class-attribute
instance-attribute
¶
citations_found
class-attribute
instance-attribute
¶
current_chunk_range
class-attribute
instance-attribute
¶
message
class-attribute
instance-attribute
¶
options: show_source: true show_bases: true inherited_members: true
bookwyrm.CitationStreamResponse ¶
Bases: BaseModel
Individual citation found during streaming.
Sent when a citation is found during streaming citation requests.
options: show_source: true show_bases: true inherited_members: true
bookwyrm.CitationSummaryResponse ¶
Bases: BaseModel
Final summary of citation processing.
Sent at the end of streaming citation requests with final statistics.
type
class-attribute
instance-attribute
¶
total_citations
class-attribute
instance-attribute
¶
chunks_processed
class-attribute
instance-attribute
¶
token_chunks_processed
class-attribute
instance-attribute
¶
start_offset
class-attribute
instance-attribute
¶
usage
class-attribute
instance-attribute
¶
options: show_source: true show_bases: true inherited_members: true
bookwyrm.CitationErrorResponse ¶
options: show_source: true show_bases: true inherited_members: true
bookwyrm.SummarizeProgressUpdate ¶
Bases: BaseModel
Progress update during summarization processing.
Sent during streaming summarization to show hierarchical processing progress.
type
class-attribute
instance-attribute
¶
current_level
class-attribute
instance-attribute
¶
total_levels
class-attribute
instance-attribute
¶
chunks_processed
class-attribute
instance-attribute
¶
total_chunks
class-attribute
instance-attribute
¶
summaries_created
class-attribute
instance-attribute
¶
message
class-attribute
instance-attribute
¶
options: show_source: true show_bases: true inherited_members: true
bookwyrm.SummarizeErrorResponse ¶
options: show_source: true show_bases: true inherited_members: true
bookwyrm.PhraseProgressUpdate ¶
Bases: BaseModel
Progress update for phrasal processing.
Sent during streaming phrasal processing to show progress.
options: show_source: true show_bases: true inherited_members: true
bookwyrm.TextResult ¶
options: show_source: true show_bases: true inherited_members: true
bookwyrm.TextSpanResult ¶
options: show_source: true show_bases: true inherited_members: true
Union Types¶
bookwyrm.StreamingCitationResponse
module-attribute
¶
StreamingCitationResponse = Union[
CitationProgressUpdate,
CitationStreamResponse,
CitationSummaryResponse,
CitationErrorResponse,
]
options: show_source: true show_bases: true inherited_members: true
bookwyrm.StreamingSummarizeResponse
module-attribute
¶
StreamingSummarizeResponse = Union[
SummarizeProgressUpdate,
SummaryResponse,
SummarizeErrorResponse,
RateLimitMessage,
StructuralErrorMessage,
]
options: show_source: true show_bases: true inherited_members: true
bookwyrm.StreamingPhrasalResponse
module-attribute
¶
options: show_source: true show_bases: true inherited_members: true
Enums¶
bookwyrm.ResponseFormat ¶
options: show_source: true show_bases: true inherited_members: true