A/B testing can increase conversion rates by 40%+. Companies like Booking.com run 25,000+ tests per year. This guide shows how to test effectively in mobile apps.
Why A/B Testing Matters
- Data over opinions (remove guesswork)
- Incremental 5% improvements compound
- Understand user behavior
- Reduce risk of bad changes
- Optimize conversion funnels
What to Test
High-Impact Areas
Onboarding:
- Number of steps
- Permission request timing
- Tutorial style
- Signup vs guest option
- Value proposition copy
Conversion:
- CTA button copy/color/size
- Paywall design and copy
- Pricing display
- Trial length
- Social proof placement
Engagement:
- Home screen layout
- Navigation structure
- Feature discoverability
- Notification copy and timing
- Empty states
Monetization:
- Ad placement and frequency
- IAP pricing
- Subscription tiers
- Free trial vs freemium
A/B Test Fundamentals
Test Design
Components:
1. Hypothesis: "If we change X, then Y will improve"
2. Control: Current version (baseline)
3. Variant(s): New version(s) to test
4. Metric: What you're measuring
5. Sample size: Users needed
6. Duration: How long to run
Example hypothesis:
"If we change the CTA button from 'Sign Up' to 'Get Started',
then signup conversion will increase by 10%+ because it's less
committal and more action-oriented."
Statistical Significance
Key concepts:
Confidence level: 95% (industry standard)
= 95% sure difference is real, not random
P-value: < 0.05
= Less than 5% chance result is random
Statistical power: 80%
= 80% chance of detecting real difference
Sample size calculation:
n = (Z² × p × (1-p)) / E²
Where:
Z = 1.96 (95% confidence)
p = baseline conversion rate
E = minimum detectable effect
Sample Size Calculator
Example:
Baseline conversion: 10%
Minimum detectable effect: 10% relative (1% absolute)
Statistical power: 80%
Confidence: 95%
Result: ~15,000 users per variant needed
Use tools:
- Optimizely calculator
- Evan Miller's calculator
- VWO calculator
A/B Testing Platforms
Popular Tools
Firebase Remote Config + A/B Testing (Free):
- Integrated with Firebase
- Simple setup
- Limited advanced features
- Good for startups
Optimizely ($50K+/year):
- Enterprise-grade
- Advanced targeting
- Full-stack testing
- Comprehensive analytics
Apptimize ($1K+/month):
- Mobile-focused
- Visual editor
- Feature flags
- Cross-platform
Split.io ($1K+/month):
- Feature flagging
- Real-time analytics
- Advanced targeting
- API-first
Firebase A/B Testing Setup
iOS Implementation
import Firebase
import FirebaseRemoteConfig
class ABTestManager {
let remoteConfig = RemoteConfig.remoteConfig()
func setupDefaults() {
remoteConfig.setDefaults([
"cta_button_text": "Sign Up" as NSObject,
"onboarding_steps": 5 as NSObject,
"show_social_proof": false as NSObject
])
}
func fetchConfig(completion: @escaping () -> Void) {
// Development: 0 second cache
// Production: 12 hour cache
let settings = RemoteConfigSettings()
settings.minimumFetchInterval = isDebug ? 0 : 43200
remoteConfig.configSettings = settings
remoteConfig.fetch { status, error in
if status == .success {
self.remoteConfig.activate { _, _ in
completion()
}
}
}
}
func getButtonText() -> String {
return remoteConfig["cta_button_text"].stringValue ?? "Sign Up"
}
}
// Usage
let abTest = ABTestManager()
abTest.setupDefaults()
abTest.fetchConfig {
let buttonText = abTest.getButtonText()
self.signUpButton.setTitle(buttonText, for: .normal)
}
Android Implementation
import com.google.firebase.remoteconfig.FirebaseRemoteConfig
import com.google.firebase.remoteconfig.ktx.remoteConfigSettings
class ABTestManager(private val context: Context) {
private val remoteConfig = FirebaseRemoteConfig.getInstance()
init {
setupDefaults()
}
private fun setupDefaults() {
remoteConfig.setDefaultsAsync(mapOf(
"cta_button_text" to "Sign Up",
"onboarding_steps" to 5,
"show_social_proof" to false
))
}
fun fetchConfig(onComplete: () -> Unit) {
val configSettings = remoteConfigSettings {
minimumFetchIntervalInSeconds = if (BuildConfig.DEBUG) 0 else 43200
}
remoteConfig.setConfigSettingsAsync(configSettings)
remoteConfig.fetchAndActivate()
.addOnCompleteListener { task ->
if (task.isSuccessful) {
onComplete()
}
}
}
fun getButtonText(): String {
return remoteConfig.getString("cta_button_text")
}
}
// Usage
val abTest = ABTestManager(context)
abTest.fetchConfig {
val buttonText = abTest.getButtonText()
signUpButton.text = buttonText
}
Test Implementation Best Practices
Randomization
Random assignment:
- 50/50 split for A/B test
- 33/33/33 for A/B/C test
- Or custom (e.g., 80% control, 20% variant)
Ensure:
✓ User stays in same variant (use user ID)
✓ True randomization (no bias)
✓ Even distribution across variants
// Consistent bucketing
func assignVariant(userId: String) -> String {
let hash = userId.hashValue % 100
if hash < 50 {
return "control"
} else {
return "variant"
}
}
Test Isolation
- Run one test at a time (or non-overlapping)
- Don't test multiple things simultaneously on same page
- Allow "cooldown" between tests
- Document all running tests
Tracking and Analytics
Event Tracking
Track test exposure and outcomes:
// Log test participation
Analytics.logEvent("ab_test_view", parameters: [
"test_name": "cta_button_test",
"variant": variant,
"user_id": userId,
"timestamp": Date().timeIntervalSince1970
])
// Log conversion event
Analytics.logEvent("signup_completed", parameters: [
"test_name": "cta_button_test",
"variant": variant,
"user_id": userId
])
// Log revenue (if applicable)
Analytics.logEvent("purchase", parameters: [
"test_name": "paywall_test",
"variant": variant,
"value": price,
"currency": "USD"
])
Metrics to Track
Primary metric (what you're optimizing):
- Conversion rate
- Retention rate
- Revenue per user
- Feature adoption
Secondary metrics (guardrails):
- Engagement time
- Other conversion funnels
- Crash rate
- App rating
- Support tickets
Example:
Primary: Signup conversion
Secondary: Time to signup, D1 retention, support tickets
Analyzing Results
When to Stop a Test
Stop when:
✓ Statistical significance reached (p < 0.05)
✓ Sufficient sample size (per calculator)
✓ Ran for full business cycle (week/month)
✓ No degradation in guardrail metrics
Don't stop:
❌ "Peeking" at results too early
❌ Only 1-2 days of data
❌ Because variant is "winning" (wait for significance)
❌ Low sample size
Result Interpretation
Scenario 1: Clear winner
Variant A: 12% conversion (p < 0.01)
Variant B: 10% conversion (baseline)
→ Roll out Variant A to 100%
Scenario 2: No significant difference
Variant A: 10.3% (p = 0.25)
Variant B: 10.0%
→ Keep current version, test something else
Scenario 3: Variant wins but guardrails fail
Variant A: 15% signup BUT D7 retention drops 20%
→ Don't roll out, iterate on variant
Scenario 4: Unexpected behavior
Variant A: 15% conversion on iOS, 5% on Android
→ Segment analysis, platform-specific rollout
Common Mistakes
Statistical Errors
- ❌ P-hacking: Running test until you see significance
- ❌ Small sample: Not enough users for reliable results
- ❌ Short duration: Not accounting for weekly patterns
- ❌ Multiple testing: Not adjusting for multiple comparisons
- ❌ Ignoring segments: Missing important differences
Implementation Errors
- ❌ Inconsistent bucketing: User sees different variants
- ❌ Tech debt: Not removing old test code
- ❌ No fallback: Crash if config fails to load
- ❌ Testing everything: Diluting focus
Advanced Techniques
Multivariate Testing
Test multiple elements simultaneously:
Elements:
- Button color (red, blue, green)
- Button text (Sign Up, Get Started, Join)
- Image (hero, illustration, screenshot)
Combinations: 3 × 3 × 3 = 27 variants
Required traffic: Sample size × 27
(Often impractical for mobile apps)
Alternative: Sequential A/B tests
Segmented Testing
Test different variants for segments:
Segments:
- New vs returning users
- Free vs paid users
- iOS vs Android
- Geography
- Device type
- App version
Example:
New users: Short onboarding (3 steps)
Returning users: Skip onboarding
Implementation:
if (isNewUser) {
return remoteConfig["onboarding_new_user"].intValue
} else {
return remoteConfig["onboarding_returning"].intValue
}
Bandit Algorithms
Multi-armed bandit:
- Explore (try variants) + Exploit (show winner)
- Automatically shifts traffic to winner
- Reduces regret (users seeing worse variant)
- Good for continuous optimization
Use when:
- Traffic is expensive
- Can't wait for statistical significance
- Want to minimize poor experience
Mobile-Specific Considerations
App Store Review
- Don't drastically change app during review
- Keep percentage rollout low during review
- Feature flags > A/B tests for major changes
- Document tests in review notes if asked
App Updates
Challenge: Not all users update immediately
Solution:
- Version-specific tests
- Remote config for non-code changes
- Minimum version requirements
- Gradual rollout strategy
Example:
if (appVersion >= "2.5.0") {
// New test available
variant = getABTestVariant()
} else {
// Old version
variant = "control"
}
Offline Behavior
Handle offline scenarios:
1. Cache remote config locally
2. Use last-known variant
3. Default to control if first launch offline
4. Sync when connection restored
let cachedVariant = UserDefaults.standard.string(forKey: "last_variant")
let variant = remoteConfig["test_variant"].stringValue ?? cachedVariant ?? "control"
Test Prioritization
PIE Framework
Score each test idea:
Potential (1-10): Expected impact
Importance (1-10): Traffic to page
Ease (1-10): Effort to implement
PIE Score = (P + I + E) / 3
Example:
Test 1: Change CTA button
- Potential: 8 (could increase signups)
- Importance: 9 (everyone sees it)
- Ease: 10 (simple change)
- PIE: 9.0 → High priority
Test 2: Redesign entire onboarding
- Potential: 9
- Importance: 9
- Ease: 3 (months of work)
- PIE: 7.0 → Medium priority
Documentation
Test Documentation Template
Test Name: CTA Button Text Test
Hypothesis: Changing button from "Sign Up" to "Get Started"
will increase conversion by 10%+ due to lower commitment language.
Setup:
- Variants: Control ("Sign Up"), Variant ("Get Started")
- Traffic split: 50/50
- Location: Login screen
- Platforms: iOS + Android
Metrics:
- Primary: Signup completion rate
- Secondary: Time to signup, D1 retention
Sample size: 16,000 users per variant
Duration: 14 days
Start date: 2025-01-15
End date: 2025-01-29
Results:
- Control: 10.2% conversion (n=16,438)
- Variant: 11.8% conversion (n=16,512)
- Uplift: +15.7% (p=0.003)
- Decision: Roll out variant
Learnings:
- Action-oriented language performs better
- Effect stronger on mobile than tablet
- Next test: Try "Start Free Trial"
A/B Testing Checklist
Before launching:
□ Clear hypothesis documented
□ Sample size calculated
□ Test duration planned (1-2+ weeks)
□ Tracking implemented and tested
□ Variants coded and QA tested
□ Random assignment working correctly
□ Fallback/default values set
□ Team notified of test
During test:
□ Monitor for tech issues
□ Check segment behavior
□ Verify even traffic distribution
□ Don't peek at results early
After test:
□ Statistical significance reached
□ Sufficient sample size collected
□ Results documented
□ Decision made (roll out/iterate/abandon)
□ Test code removed or cleaned up
□ Learnings shared with team
Conclusion
A/B testing transforms assumptions into data-backed decisions. Start with high-impact tests, ensure statistical rigor, and build a culture of experimentation. Small improvements compound into significant gains.