Skip to content

Workflows Module Documentation

This module provides workflow management functionality for the LatteReview package, implementing multi-agent review orchestration. The updated version includes support for image-based inputs alongside text inputs in the review workflow.

Overview

The workflows module consists of one main class:

  • ReviewWorkflow: A class that orchestrates multi-agent review processes with support for sequential rounds, filtering, and handling both text and image inputs.

ReviewWorkflow

Description

The ReviewWorkflow class manages review workflows where multiple agents can review items across different rounds. It handles the orchestration of reviews, manages outputs, and provides content validation and cost tracking.

Class Definition

class ReviewWorkflow(pydantic.BaseModel):
    workflow_schema: List[Dict[str, Any]]
    memory: List[Dict] = list()
    reviewer_costs: Dict = dict()
    total_cost: float = 0.0
    verbose: bool = True

Key Attributes

  • workflow_schema: List of dictionaries defining the review process structure
  • memory: List storing the workflow's execution history
  • reviewer_costs: Dictionary tracking costs per reviewer and round
  • total_cost: Total accumulated cost of all reviews
  • verbose: Flag to enable/disable logging output

Methods

__call__()

Execute the workflow on provided data.

async def __call__(self, data: Union[pd.DataFrame, Dict[str, Any]]) -> pd.DataFrame:
    """Execute workflow on DataFrame or dictionary input."""
    # Returns DataFrame with review results

run()

Core method to execute the review workflow.

async def run(self, data: pd.DataFrame) -> pd.DataFrame:
    """Execute main review workflow."""
    # Process each round sequentially
    # Returns updated DataFrame with review results

get_total_cost()

Get total cost of workflow execution.

def get_total_cost(self) -> float:
    """Return sum of all review costs."""

Internal Methods

  • _format_text_input(): Format input for reviewers
  • _format_image_input(): Validate and format image paths
  • _log(): Handle logging based on verbose setting

Image Input Handling

The updated workflow supports image inputs in addition to text inputs. Images are validated for file existence and format before being passed to reviewers.

Usage Examples

Creating a Basic Review Workflow

from lattereview.workflows import ReviewWorkflow
from lattereview.agents import ScoringReviewer
from lattereview.providers import LiteLLMProvider

# Create reviewers
reviewer1 = ScoringReviewer(
    provider=LiteLLMProvider(model="gpt-4o"),
    name="Initial",
    scoring_task="Initial paper screening"
)

reviewer2 = ScoringReviewer(
    provider=LiteLLMProvider(model="gpt-4o"),
    name="Expert",
    scoring_task="Detailed technical review"
)

# Define workflow schema
workflow_schema = [
    {
        "round": 'A',
        "reviewers": reviewer1,
        "text_inputs": ["title", "abstract"],
        "image_inputs": ["figure_path"]
    },
    {
        "round": 'B',
        "reviewers": reviewer2,
        "text_inputs": ["title", "abstract", "round-A_Initial_output"],
        "filter": lambda row: row["round-A_Initial_score"] > 3
    }
]

# Create and run workflow
workflow = ReviewWorkflow(workflow_schema=workflow_schema)
results = await workflow(input_data)

Understanding Workflow Construction

A workflow is defined by a list of dictionaries where each dictionary represents a review round. The rounds are executed sequentially in the order they appear in the list. Each round dictionary must contain:

Required Arguments:

  • round: A string identifier for the round (e.g., 'A', 'B', '1', 'initial')
  • reviewers: A single ScoringReviewer or list of ScoringReviewers
  • text_inputs: A string or list of strings representing DataFrame column names to use

Optional Arguments:

  • image_inputs: A list of DataFrame column names containing paths to image files
  • filter: A lambda function that determines which rows to review in this round

Handling Results

# Execute workflow
workflow = ReviewWorkflow(workflow_schema=workflow_schema)
try:
    results_df = await workflow(input_df)

    # Access costs
    total_cost = workflow.get_total_cost()
    per_reviewer = workflow.reviewer_costs

    # Access results
    round_a_scores = results_df["round-A_Initial_score"]
    round_b_reasoning = results_df["round-B_Expert_reasoning"]

except ReviewWorkflowError as e:
    print(f"Workflow failed: {e}")

Error Handling

The module uses ReviewWorkflowError for all workflow-related errors:

class ReviewWorkflowError(Exception):
    """Base exception for workflow-related errors."""
    pass

Common error scenarios:

  • Invalid workflow schema
  • Missing input columns
  • Reviewer initialization failures
  • Content validation errors
  • Output processing failures
  • Invalid image file paths

Best Practices

  1. Schema Design

  2. Use meaningful round identifiers

  3. Design filter functions carefully
  4. Consider round dependencies

  5. Data Management

  6. Validate input data structure

  7. Handle missing values appropriately
  8. Use appropriate column names

  9. Cost Control

  10. Monitor per-round costs

  11. Set appropriate concurrent request limits
  12. Track total workflow costs

  13. Error Handling

  14. Implement proper exception handling
  15. Validate workflow schema
  16. Monitor review outputs

Limitations

  • Sequential round execution only
  • Memory grows with number of reviews
  • No direct reviewer communication
  • Limited to DataFrame-based workflows
  • Requires async/await syntax
  • Filter functions must be serializable