apiDeepDive15 min read

Structify: From Chaos to Structure in Seconds

Technical deep dive into the Structify tool. Learn schema design patterns, validation strategies, advanced use cases, error handling, and production-ready implementation patterns for transforming unstructured data.

Dr. Emily Chenblog.common.updated April 17, 2025

TL;DR

  • Structify transforms unstructured text into structured JSON using AI-powered extraction
  • Design schemas with proper field types, validation rules, and nested structures
  • Implement error handling for extraction failures, validation errors, and edge cases
  • Use validation strategies: strict mode for critical data, lenient mode for exploration
  • Costs 3 points per call—process 500 documents on the Starter plan (1500 points)
  • Production patterns: batch processing, retry logic, quality validation, and caching

What is Structify?

AI-Powered Data Extraction

Structify is one of AppHighway's most powerful tools for transforming unstructured text into structured data. Whether you're parsing emails, extracting information from documents, or cleaning messy datasets, Structify uses advanced AI models to understand context and extract exactly what you need.

Key Capabilities

**Schema-Based Extraction**: Define your desired output structure with JSON Schema
**Type Inference**: Automatically detect and convert field types (strings, numbers, dates, booleans)
**Nested Objects**: Extract complex hierarchical data structures
**Array Handling**: Parse lists and collections from unstructured text
**Validation**: Built-in validation ensures extracted data matches your schema
**Multi-Language**: Works with text in 100+ languages

Common Use Cases

📧 **Email Parsing**: Extract contacts, dates, and key information from customer emails
📄 **Document Processing**: Parse invoices, receipts, contracts, and forms
🏢 **CRM Enrichment**: Extract structured contact data from email signatures
📊 **Data Migration**: Clean and structure legacy data during migrations
🔍 **Web Scraping**: Transform scraped HTML/text into structured records
🤖 **Chatbot Integration**: Extract structured intent and entities from user messages

Schema Design Patterns

Build Effective Extraction Schemas

The quality of your results depends heavily on schema design. Here's how to create schemas that extract exactly what you need.

1. Basic Schema Structure

Start with a simple flat schema for straightforward extraction

Example: Contact Extraction

blogStructify.schemaDesign.basicStructure.example.code

blog.common.input: Input text: 'Hi, I'm Sarah Johnson from TechCorp (sarah.j@techcorp.com, +1-555-0123). I'm the VP of Engineering.'

blog.common.output: blogStructify.schemaDesign.basicStructure.output

2. Field Type Definitions

Specify exact types for better validation and type safety

**string**: Text values (names, addresses, descriptions)
**number**: Numeric values (prices, quantities, IDs)
**boolean**: True/false flags (is_active, has_discount)
**date**: ISO 8601 dates (2025-04-13T10:30:00Z)
**array**: Lists of values ([items], [tags])
blogStructify.schemaDesign.typeDefinitions.types.object

Example: Invoice Schema with Types

blogStructify.schemaDesign.typeDefinitions.example.code

3. Nested Object Schemas

Extract hierarchical data with nested objects

Example: Product with Nested Details

blogStructify.schemaDesign.nestedStructures.example.code

Nested schemas keep related data organized and make downstream processing easier.

4. Array Field Patterns

Extract lists and collections from text

Simple Arrays (primitives)

blogStructify.schemaDesign.arrayHandling.simpleArrays.example

Object Arrays (structured lists)

blogStructify.schemaDesign.arrayHandling.objectArrays.code

Perfect for invoices, shopping carts, multi-item forms, and product lists.

5. Optional vs Required Fields

Mark fields as optional when they might not appear in all documents

**Required fields**: Core data that must be present (name, email, invoice_number)
**Optional fields**: Data that may be missing (phone, middle_name, discount_code)

Example: Contact with Optional Fields

blogStructify.schemaDesign.optionalFields.example.code

Use `?` suffix or specify `required: false` in JSON Schema format.

Validation Strategies

Ensure Data Quality

Validation ensures extracted data meets your quality standards before downstream processing.

1. Strict Mode

Reject responses that don't match the schema exactly

blog.common.when: Use for critical data: financial records, legal documents, customer orders

blog.common.behavior: Returns error if any required field is missing or type mismatches occur

2. Lenient Mode

Return partial results with missing fields as null

blog.common.when: Use for exploratory analysis, fuzzy matching, optional data extraction

blog.common.behavior: Returns best-effort extraction with null for missing fields

Field-Level Validation

**Email validation**: Regex pattern matching for valid email format
**Phone validation**: International format validation (E.164)
**URL validation**: Valid HTTP/HTTPS URLs with proper encoding
**Date validation**: ISO 8601 format, reasonable date ranges
**Range validation**: Numeric values within specified min/max
**Enum validation**: Value must be from predefined list

Custom Validation Rules

Implement business logic validation after extraction

Example: Invoice Amount Validation

blogStructify.validation.customRules.example.code

Error Handling Patterns

**Missing required fields**: Retry with simplified schema or flag for manual review
**Type conversion failures**: Provide default values or fallback logic
**Malformed input**: Pre-process text (OCR cleanup, encoding fixes) before extraction
**Empty results**: Check if input text actually contains the expected data

Advanced Use Cases

Real-World Implementation Patterns

1. Email Conversation Threading

Extract structured data from multi-party email threads

blog.common.challenge: Email threads contain multiple messages, quoted replies, signatures

blog.common.solution: Extract array of messages with sender, timestamp, body

blogStructify.advancedUseCases.emailParsing.schema

Enables sentiment analysis, response time tracking, and conversation history

2. Contract Clause Extraction

Extract specific clauses and terms from legal documents

blog.common.challenge: Contracts have complex structure, legal jargon, nested clauses

blog.common.solution: Define schema for standard clauses (payment terms, termination, liability)

blogStructify.advancedUseCases.documentComparison.schema

Automate contract review, compare terms across vendors, flag risky clauses

3. Multi-Page Form Extraction

Extract data from scanned forms (applications, surveys, registrations)

blog.common.challenge: Forms span multiple pages, handwritten entries, checkbox fields

blog.common.solution: OCR → Text cleanup → Structify with form field schema

1. OCR with Tesseract/Cloud Vision
2. Text cleaning (remove artifacts, fix encoding)
3. Structify with checkbox handling
4. Validate extracted data
5. Flag low-confidence fields for review

10x faster than manual data entry, enables bulk form processing

4. Product Catalog Migration

Migrate legacy product data from PDFs or text files to structured database

blog.common.challenge: Inconsistent formatting, missing fields, mixed units

blog.common.solution: Batch processing with schema normalization

blogStructify.advancedUseCases.productCatalog.schema

Normalize units, deduplicate SKUs, validate prices, enrich missing fields

Migrate 10,000+ products in hours instead of weeks

Production Implementation

Best Practices for Production Use

1. Batch Processing Pattern

Process multiple documents efficiently

blogStructify.implementation.batchProcessing.code

Process 1000 documents in 15 minutes instead of 3+ hours sequentially

2. Retry Logic for Transient Failures

Handle temporary errors gracefully

blogStructify.implementation.retryLogic.code

Implement exponential backoff: 2s, 4s, 8s delays between retries

3. Quality Validation Pipeline

Validate extraction quality before downstream use

**Completeness**: Check that all required fields are populated
**Consistency**: Cross-field validation (totals match line items)
**Format**: Email/phone/URL format validation
**Range**: Values within expected ranges (price > 0, quantity < 10000)
**Duplicates**: Check for duplicate extractions from same document
blogStructify.implementation.qualityValidation.code

Flag low-quality extractions for manual review instead of using bad data

4. Result Caching

Cache extraction results to save points and improve performance

Hash input text + schema → Cache key

blogStructify.implementation.caching.code

Save 70% of points on repeated extractions, 10x faster response times

5. Monitoring & Observability

Track extraction quality and performance

**Success rate**: % of extractions that succeed vs fail
**Extraction time**: p50, p95, p99 latencies
**Field population**: % of extractions with all required fields
**Validation failures**: Track common validation errors
**Points usage**: Monitor daily/weekly consumption

Alert on: success rate < 95%, extraction time > 10s, daily points > budget

Error Handling & Troubleshooting

Common Issues and Solutions

InsufficientPointsError

blog.common.cause: Account balance too low (< 3 points)

blog.common.solution: Purchase more points or implement queueing for batch processing

SchemaValidationError

blog.common.cause: Extracted data doesn't match schema (missing required fields, type mismatch)

blog.common.solution: Switch to lenient mode, simplify schema, or improve input text quality

EmptyExtractionError

blog.common.cause: No data extracted from input text

blog.common.solution: Check if input text contains expected data, improve text preprocessing (OCR quality)

TimeoutError

blog.common.cause: Extraction took longer than 30 seconds (very large documents)

blog.common.solution: Split large documents into smaller chunks, increase timeout, or use async processing

RateLimitExceededError

blog.common.cause: Too many requests per minute (default: 60 requests/min)

blog.common.solution: Implement exponential backoff, reduce request rate, or request rate limit increase

Best Practices

1. Start Simple, Iterate

Begin with basic flat schemas and add complexity as needed

2. Use Type Definitions

Always specify field types for better validation and type safety

3. Handle Missing Fields

Design schemas with optional fields for real-world messy data

4. Validate Before Use

Never use extracted data without validation—implement quality checks

5. Cache Results

Cache extraction results for repeated documents to save points and time

6. Monitor Quality

Track success rates, field population, and validation failures over time

7. Batch Process

Process documents in parallel batches for 10x performance improvement

8. Implement Retry Logic

Handle transient failures with exponential backoff retry logic

9. Preprocess Text

Clean OCR output, fix encoding issues, and remove artifacts before extraction

10. Test with Real Data

Test schemas with production-like data to catch edge cases early

Real-World Example: Resume Parser

Complete Implementation

blog.common.scenario

HR department needs to parse 500 resumes into structured candidate records

Requirements

Extract: name, email, phone, experience, education, skills

Validate: email format, phone format, required fields present

Process: 500 resumes in under 20 minutes

Quality: 95%+ success rate, flag incomplete records for review

Implementation

Schema:

blogStructify.realWorldExample.implementation.schema

Implementation:

blogStructify.realWorldExample.implementation.code

Results

**Processed**: 500 resumes in 18 minutes

**Success rate**: 96.4% (482 complete, 18 flagged for review)

**Cost**: 1500 points (500 resumes × 3 points) = $15

**Time saved**: 40+ hours of manual data entry

**Quality**: 98% field accuracy on validated records

Next Steps

1. Get Your API Token

Sign up at apphighway.com/dashboard to get your API token and 100 free points

2. Design Your Schema

Define the structure you want to extract using the patterns in this guide

3. Test with Sample Data

Test your schema with representative documents to validate extraction quality

4. Implement Production Patterns

Add batch processing, retry logic, validation, and caching from this guide

5. Monitor & Optimize

Track success rates, field population, and points usage to optimize costs

Transform Unstructured Data with Confidence

Structify is a powerful tool for transforming messy, unstructured text into clean, structured data. By following the schema design patterns, validation strategies, and production best practices in this guide, you can build reliable data extraction pipelines that save hours of manual work and enable new automation workflows. Start with simple schemas, iterate based on real data, and implement quality validation to ensure production-ready results.

Structify: From Chaos to Structure in Seconds | Technical Deep Dive