Version Control and Testing

Master version control in PromptOwl - test drafts, compare versions with evaluation sets, publish safely, and roll back when needed.

Learn how to manage agent versions in PromptOwl. This guide covers the complete workflow from drafting changes to publishing safely, with testing and rollback strategies.

Why Version Control Matters

When you're iterating on AI agents, you need:

Safety: Don't break production while experimenting
Testing: Verify changes before they go live
History: Track what changed and when
Rollback: Quickly revert if something goes wrong

PromptOwl's version system gives you all of this.

Understanding Versions

Version States

State

Icon

Meaning

Draft

Gray

Work-in-progress, not live

Published

Green

Active version users see

Historical

None

Previous versions in history

What Gets Versioned

Every version captures:

System prompt content
Block configurations
Model settings (provider, temperature, tokens)
Connected datasets
Tool selections
Variable definitions

What Doesn't Get Versioned

These are separate from versions:

Conversations (tied to prompt, not version)
Evaluation sets (independent)
Annotations (on conversations)
API keys (account-level)

The Version Workflow

┌─────────────────────────────────────────────────────────────┐
│                                                             │
│    ┌──────────┐     ┌──────────┐     ┌──────────┐          │
│    │  DRAFT   │────▶│   TEST   │────▶│ EVALUATE │          │
│    │  Changes │     │  Locally │     │  Quality │          │
│    └──────────┘     └──────────┘     └──────────┘          │
│                                            │               │
│                                            ▼               │
│    ┌──────────┐     ┌──────────┐     ┌──────────┐          │
│    │ ROLLBACK │◀────│ MONITOR  │◀────│ PUBLISH  │          │
│    │ If Needed│     │  Usage   │     │  Live    │          │
│    └──────────┘     └──────────┘     └──────────┘          │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Step 1: Create a Draft

Making Changes

Open your agent in the editor
Make your changes (prompt, settings, etc.)
Click Save to create a draft

Your changes are now saved but NOT live. Users still see the published version.

Draft Indicators

Look for these signs you're working on a draft:

"Draft" badge in the editor
"Unsaved changes" warning if you navigate away
Version number shows "(draft)" suffix

Multiple Drafts

You can only have one active draft at a time. Each save overwrites the previous draft until you publish.

Step 2: Test Your Changes

Using the Chat Interface

Before publishing, test your draft:

Open the Chat tab while in draft mode
The chat uses your draft version (not published)
Test with real questions
Verify responses are correct

Test Checklist

For each change, verify:

Testing Tips

For prompt changes:

Before: "You are a helpful assistant"
After: "You are a friendly support agent for Acme Corp"

Test: Ask questions that should trigger the new persona

For model changes:

Before: GPT-4o (temperature 0.7)
After: Claude 3.5 Sonnet (temperature 0.3)

Test: Compare response quality and consistency

For RAG changes:

Before: No knowledge base
After: Connected product documentation

Test: Ask questions that require document retrieval

Step 3: Evaluate with Test Sets

Why Evaluate?

Manual testing catches obvious issues. Evaluation sets catch systematic problems across many scenarios.

Creating an Evaluation Set

Go to the Evaluate tab
Click Create Evaluation Set
Add test cases:

Input

Expected Behavior

"What's your return policy?"

References return policy document

"How much does it cost?"

Mentions pricing tiers

"I'm frustrated with your service"

Responds empathetically

Running Evaluations

With your draft active, go to Evaluate
Select your evaluation set
Click Run Evaluation
Review pass/fail results

Comparing Version Performance

Run the same evaluation set on:

Your current published version
Your draft version

Compare the results:

Version

Pass Rate

Notes

v3 (Published)

85%

Current baseline

v4 (Draft)

92%

Improvement on pricing questions

Only publish if the draft performs equal to or better than the current version.

Using AI Judge

For subjective quality, configure AI Judge:

In evaluation settings, enable AI Judge
Set scoring criteria:
- Accuracy (1-5)
- Helpfulness (1-5)
- Tone (1-5)
Run evaluation with AI scoring
Review aggregate scores

Step 4: Publish Safely

Pre-Publish Checklist

Before clicking Publish:

Publishing

Click Publish in the editor
Confirm the action
Your draft becomes the new published version

What Happens on Publish

Draft becomes the active version
Previous published version moves to history
All new conversations use the new version
Existing conversations continue with their original version
API consumers immediately get the new version

Gradual Rollout (Advanced)

For high-traffic agents, consider:

Publish at low-traffic times - Fewer users affected if issues arise
Monitor closely after publish - Watch for problems in the first hour
Have rollback ready - Know which version to revert to

Step 5: Monitor After Publishing

What to Watch

After publishing, monitor:

Metric

Where to Find

Warning Sign

Error rate

Conversations

Sudden increase

User feedback

Annotations

Negative sentiment spike

Response quality

Sample conversations

Unexpected responses

Token usage

Analytics

Unusual increase/decrease

Setting Up Monitoring

Go to Monitor tab
Filter to recent conversations
Review a sample of responses
Check for annotation patterns

How Long to Monitor

Agent Type

Monitoring Period

Low traffic (<100/day)

24-48 hours

Medium traffic

4-8 hours

High traffic (>1000/day)

1-2 hours

Step 6: Rollback When Needed

When to Rollback

Rollback immediately if you see:

Systematic errors in responses
Spike in negative annotations
Critical functionality broken
Compliance or safety issues

How to Rollback

Go to Versions panel (right sidebar)
Find the last known good version
Click on it to preview
Click Publish on that version
Confirm the rollback

Rollback is Safe

Creates a new version (doesn't delete history)
Instant effect on new conversations
Existing conversations unaffected
You can always roll forward again

Post-Rollback Actions

After rolling back:

Document the issue - What went wrong?
Analyze the failed version - Why did it fail?
Fix in a new draft - Address the root cause
Re-test thoroughly - Don't repeat the mistake
Try again - When ready, publish the fixed version

Version History Best Practices

Meaningful Changes

Make each version meaningful:

Good:

v1: Initial release
v2: Added product documentation RAG
v3: Improved handling of refund requests
v4: Updated to Claude 3.5 Sonnet

Bad:

v1: Initial
v2: Fixed typo
v3: Testing
v4: Testing again
v5: Final
v6: Actually final

Version Notes

When saving/publishing, include notes about:

What changed
Why it changed
Expected impact

Regular Cleanup

Periodically review your version history:

Identify which versions were successful
Note patterns in what worked/didn't work
Use insights for future changes

Team Workflows

Review Before Publish

For team environments:

Developer creates draft and tests locally
Reviewer checks changes and runs evaluations
Approver gives go-ahead to publish
Publisher makes version live

Avoiding Conflicts

When multiple people edit:

Only one draft exists at a time
Last save wins
Communicate about who's editing
Use the Versions panel to see recent changes

Shared Evaluation Sets

Create evaluation sets that the whole team uses:

Standard test cases everyone runs
Ensures consistent quality bar
Makes comparisons meaningful

API Consumer Considerations

How API Users Experience Versions

When you publish a new version:

All API calls immediately use the new version
No API changes needed on consumer side
Conversation history continues normally

Versioning for API Stability

If API consumers need stability:

Communicate changes - Notify before major updates
Test with staging - Use a separate staging agent
Gradual rollout - Publish during low-traffic periods
Rollback plan - Have previous version ready

Breaking Changes

These changes may affect API consumers:

Different response format
Changed variable requirements
New required inputs
Significantly different behavior

Communicate these before publishing.

Troubleshooting

Draft Not Saving

Check internet connection
Verify you have edit permissions
Try refreshing the page
Check for validation errors

Can't Publish

Ensure you have owner/editor role
Check for required fields
Verify no validation errors
Try saving draft first, then publish

Wrong Version Active

Go to Versions panel
Verify which version shows "Published"
If wrong, publish the correct version
Check for recent changes by others

Rollback Failed

Refresh the page
Try again
If persistent, contact support
Document the issue

Quick Reference

Version Workflow

1. SAVE → Creates draft (not live)
2. TEST → Use chat to verify
3. EVALUATE → Run against test set
4. PUBLISH → Make live
5. MONITOR → Watch for issues
6. ROLLBACK → If needed, revert

Key Actions

Action

Effect

Save

Creates/updates draft

Publish

Draft → Live, old → History

Rollback

Historical → Live (via Publish)

Discard

Removes current draft

Safety Rules

Always test before publishing
Run evaluation sets for major changes
Monitor after publishing
Keep rollback version identified
Don't publish during peak hours

Learn More

Understanding Agents - Agent types and configuration
Monitoring and Evaluation - Quality tracking
Prompt Engineering - Write better prompts

Ready to manage versions like a pro? Get started with PromptOwl.

PreviousVibe Coding with Cursor, Bolt + PromptOwl

Last updated 1 month ago

Good evening

hashtagWhy Version Control Matters

hashtagUnderstanding Versions

hashtagVersion States

hashtagWhat Gets Versioned

hashtagWhat Doesn't Get Versioned

hashtagThe Version Workflow

hashtagStep 1: Create a Draft

hashtagMaking Changes

hashtagDraft Indicators

hashtagMultiple Drafts

hashtagStep 2: Test Your Changes

hashtagUsing the Chat Interface

hashtagTest Checklist

hashtagTesting Tips

hashtagStep 3: Evaluate with Test Sets

hashtagWhy Evaluate?

hashtagCreating an Evaluation Set

hashtagRunning Evaluations

hashtagComparing Version Performance

hashtagUsing AI Judge

hashtagStep 4: Publish Safely

hashtagPre-Publish Checklist

hashtagPublishing

hashtagWhat Happens on Publish

hashtagGradual Rollout (Advanced)

hashtagStep 5: Monitor After Publishing

hashtagWhat to Watch

hashtagSetting Up Monitoring

hashtagHow Long to Monitor

hashtagStep 6: Rollback When Needed

hashtagWhen to Rollback

hashtagHow to Rollback

hashtagRollback is Safe

hashtagPost-Rollback Actions

hashtagVersion History Best Practices

hashtagMeaningful Changes

hashtagVersion Notes

hashtagRegular Cleanup

hashtagTeam Workflows

hashtagReview Before Publish

hashtagAvoiding Conflicts

hashtagShared Evaluation Sets

hashtagAPI Consumer Considerations

hashtagHow API Users Experience Versions

hashtagVersioning for API Stability

hashtagBreaking Changes

hashtagTroubleshooting

hashtagDraft Not Saving

hashtagCan't Publish

hashtagWrong Version Active

hashtagRollback Failed

hashtagQuick Reference

hashtagVersion Workflow

hashtagKey Actions

hashtagSafety Rules

hashtagLearn More