Master version control in PromptOwl - test drafts, compare versions with evaluation sets, publish safely, and roll back when needed.
Learn how to manage agent versions in PromptOwl. This guide covers the complete workflow from drafting changes to publishing safely, with testing and rollback strategies.
Why Version Control Matters
When you're iterating on AI agents, you need:
Safety: Don't break production while experimenting
Testing: Verify changes before they go live
History: Track what changed and when
Rollback: Quickly revert if something goes wrong
PromptOwl's version system gives you all of this.
Understanding Versions
Version States
State
Icon
Meaning
Draft
Gray
Work-in-progress, not live
Published
Green
Active version users see
Historical
None
Previous versions in history
What Gets Versioned
Every version captures:
System prompt content
Block configurations
Model settings (provider, temperature, tokens)
Connected datasets
Tool selections
Variable definitions
What Doesn't Get Versioned
These are separate from versions:
Conversations (tied to prompt, not version)
Evaluation sets (independent)
Annotations (on conversations)
API keys (account-level)
The Version Workflow
Step 1: Create a Draft
Making Changes
Open your agent in the editor
Make your changes (prompt, settings, etc.)
Click Save to create a draft
Your changes are now saved but NOT live. Users still see the published version.
Draft Indicators
Look for these signs you're working on a draft:
"Draft" badge in the editor
"Unsaved changes" warning if you navigate away
Version number shows "(draft)" suffix
Multiple Drafts
You can only have one active draft at a time. Each save overwrites the previous draft until you publish.
Step 2: Test Your Changes
Using the Chat Interface
Before publishing, test your draft:
Open the Chat tab while in draft mode
The chat uses your draft version (not published)
Test with real questions
Verify responses are correct
Test Checklist
For each change, verify:
Testing Tips
For prompt changes:
For model changes:
For RAG changes:
Step 3: Evaluate with Test Sets
Why Evaluate?
Manual testing catches obvious issues. Evaluation sets catch systematic problems across many scenarios.
Creating an Evaluation Set
Go to the Evaluate tab
Click Create Evaluation Set
Add test cases:
Input
Expected Behavior
"What's your return policy?"
References return policy document
"How much does it cost?"
Mentions pricing tiers
"I'm frustrated with your service"
Responds empathetically
Running Evaluations
With your draft active, go to Evaluate
Select your evaluation set
Click Run Evaluation
Review pass/fail results
Comparing Version Performance
Run the same evaluation set on:
Your current published version
Your draft version
Compare the results:
Version
Pass Rate
Notes
v3 (Published)
85%
Current baseline
v4 (Draft)
92%
Improvement on pricing questions
Only publish if the draft performs equal to or better than the current version.
Using AI Judge
For subjective quality, configure AI Judge:
In evaluation settings, enable AI Judge
Set scoring criteria:
Accuracy (1-5)
Helpfulness (1-5)
Tone (1-5)
Run evaluation with AI scoring
Review aggregate scores
Step 4: Publish Safely
Pre-Publish Checklist
Before clicking Publish:
Publishing
Click Publish in the editor
Confirm the action
Your draft becomes the new published version
What Happens on Publish
Draft becomes the active version
Previous published version moves to history
All new conversations use the new version
Existing conversations continue with their original version
API consumers immediately get the new version
Gradual Rollout (Advanced)
For high-traffic agents, consider:
Publish at low-traffic times - Fewer users affected if issues arise
Monitor closely after publish - Watch for problems in the first hour
Have rollback ready - Know which version to revert to
Step 5: Monitor After Publishing
What to Watch
After publishing, monitor:
Metric
Where to Find
Warning Sign
Error rate
Conversations
Sudden increase
User feedback
Annotations
Negative sentiment spike
Response quality
Sample conversations
Unexpected responses
Token usage
Analytics
Unusual increase/decrease
Setting Up Monitoring
Go to Monitor tab
Filter to recent conversations
Review a sample of responses
Check for annotation patterns
How Long to Monitor
Agent Type
Monitoring Period
Low traffic (<100/day)
24-48 hours
Medium traffic
4-8 hours
High traffic (>1000/day)
1-2 hours
Step 6: Rollback When Needed
When to Rollback
Rollback immediately if you see:
Systematic errors in responses
Spike in negative annotations
Critical functionality broken
Compliance or safety issues
How to Rollback
Go to Versions panel (right sidebar)
Find the last known good version
Click on it to preview
Click Publish on that version
Confirm the rollback
Rollback is Safe
Creates a new version (doesn't delete history)
Instant effect on new conversations
Existing conversations unaffected
You can always roll forward again
Post-Rollback Actions
After rolling back:
Document the issue - What went wrong?
Analyze the failed version - Why did it fail?
Fix in a new draft - Address the root cause
Re-test thoroughly - Don't repeat the mistake
Try again - When ready, publish the fixed version
Version History Best Practices
Meaningful Changes
Make each version meaningful:
Good:
v1: Initial release
v2: Added product documentation RAG
v3: Improved handling of refund requests
v4: Updated to Claude 3.5 Sonnet
Bad:
v1: Initial
v2: Fixed typo
v3: Testing
v4: Testing again
v5: Final
v6: Actually final
Version Notes
When saving/publishing, include notes about:
What changed
Why it changed
Expected impact
Regular Cleanup
Periodically review your version history:
Identify which versions were successful
Note patterns in what worked/didn't work
Use insights for future changes
Team Workflows
Review Before Publish
For team environments:
Developer creates draft and tests locally
Reviewer checks changes and runs evaluations
Approver gives go-ahead to publish
Publisher makes version live
Avoiding Conflicts
When multiple people edit:
Only one draft exists at a time
Last save wins
Communicate about who's editing
Use the Versions panel to see recent changes
Shared Evaluation Sets
Create evaluation sets that the whole team uses:
Standard test cases everyone runs
Ensures consistent quality bar
Makes comparisons meaningful
API Consumer Considerations
How API Users Experience Versions
When you publish a new version:
All API calls immediately use the new version
No API changes needed on consumer side
Conversation history continues normally
Versioning for API Stability
If API consumers need stability:
Communicate changes - Notify before major updates
Test with staging - Use a separate staging agent
Gradual rollout - Publish during low-traffic periods
Before: "You are a helpful assistant"
After: "You are a friendly support agent for Acme Corp"
Test: Ask questions that should trigger the new persona
Before: GPT-4o (temperature 0.7)
After: Claude 3.5 Sonnet (temperature 0.3)
Test: Compare response quality and consistency
Before: No knowledge base
After: Connected product documentation
Test: Ask questions that require document retrieval
1. SAVE → Creates draft (not live)
2. TEST → Use chat to verify
3. EVALUATE → Run against test set
4. PUBLISH → Make live
5. MONITOR → Watch for issues
6. ROLLBACK → If needed, revert