Our Editorial Process: How We Test & Score Adult AI Apps
See the 12-step editorial process behind every review: $0-spend testing, scores locked before commission talks, public errata log, named senior editor.
By Alexandra Joly, Senior Editor · Last verified May 26, 2026 · See our scoring page and errata log
This page documents how reviews are produced on bestgirlfriend.ai. It is the editorial process linked from every Review, Pillar, Versus, and Listicle on the site, and the operational counterpart to the affiliate disclosure page. The disclosure explains how commercial revenue is walled off. This page explains the editorial process that produces the work behind that wall.
We're journalists, not platform operators or court-appointed auditors. We test what we can test, cite what we read, and flag what we couldn't independently verify. When we're wrong, we correct on the record on the errata board and in the update log at the bottom of every review. The audit trail of corrections is itself a trust signal. A site that never publishes errata isn't error-free; it's opaque.
Most reviewers in this space won't tell you any of this. The category pays well. They skip the inconvenient steps, fabricate the scoring page they claim to use, hide the byline behind a fake doctorate, and silently swap scores when a commission rate moves. We don't. The whole point of writing the editorial process down in public is so a reader (or a competitor, or a court-appointed auditor) can hold us to it.
How is each review made on bestgirlfriend.ai?
Each review follows a 12-step editorial process from discovery to publish: triage, research, hands-on testing against the published scoring page, scoring, peer reading, score lock, affiliate catalog cross-check, drafting, editorial review, publish, and a fixed retest schedule. The same 12 steps wrap AI companion apps, cam sites, adult games, and real-model pages. The order is deliberate: any commercial signal lives downstream of the score, never upstream.
The 12 steps are deliberately ordered. Discovery and triage filter what we cover at all. Research and testing build the evidence base. Scoring and peer reading produce the number. Score lock fixes the result before any commercial layer touches it. Drafting and editorial review turn the locked number into prose. Publish stamps the page. The retest schedule governs when a published review is reopened.
Reverse any pair (run the catalog check before the score lock, draft before peer reading, publish before the editorial pass) and a path opens for commerce to reach the score. So we don't reverse them.
- Discovery. A platform shows up in a search result we monitor, a competitor list, the CrakRevenue offer catalog, or a press mention. Discovery is opportunistic; coverage isn't.
- Triage. An app is eligible for a full review if it has been live for at least six months, has a reasonable user base on the most recent public traffic estimate (we haven't tested third-party traffic numbers directly, they're estimates), is legal in our primary markets, and isn't a brand-jacker of an existing app we cover. Fail any of these and the app gets logged but not reviewed.
- Research. We read the marketing site, the Terms of Service, the Privacy Policy, the pricing page, and any press the app has earned. We read user reports across Trustpilot, Sitejabber, the Better Business Bureau, and the relevant subreddit, with particular attention to billing-descriptor complaints. We pull the pricing page through the Wayback Machine for a 12-month price-drift trace.
- Hands-on testing. Tests are written down so another editor running the same protocol would get the same transcript. AI companion apps get a 10-prompt conversation test, a 5-image generation test, and a voice phrase test. Cam sites get peak-time inventory observation across three timezones, a 10-room broadcast quality test, and a zero-spend hands-on walk of the pricing page and checkout flow. Adult games get a save-state lifecycle test, a billing transparency probe, and an ad-cadence observation. Real-model pages get a content-cadence audit, an engagement check, and a niche-match probe. The full protocols live on the matching scoring pages: AI companions, cam sites, adult games, real models.
- Scoring. Each category is rated 1 to 10 against anchors published on the matching scoring page, using the test transcript as evidence. Scores are never impression-based. Every category is sourced and dated. Anything we couldn't verify (paid-tier features behind a paywall, vendor non-disclosure agreements) gets flagged "we haven't tested this directly" with the reason in a footnote.
- Peer reading. A second editor reads the scoring file, flags any disagreement greater than one point on a single category, and resolves it against the published anchors before the score advances. The peer reading goes in the scoring file under its own section so the audit trail survives.
- Score lock. The final score is computed from the weighted categories and locked. Once locked, the number can't move post-publish except via a documented retest that produces a logged change. There is no silent edit path.
- Affiliate catalog cross-check. Only after the score is locked, we cross-check the CrakRevenue catalog. If the app scores 5/10 or higher and an active offer exists, the review carries an affiliate link. If the app scores below 5/10, no affiliate link appears regardless of payout. If the score is 5/10 or higher but no offer exists, the review still publishes, without a commission link.
- Drafting. Alexandra writes the review against the locked score, following the standard shape (verdict up top, category-by-category breakdown, pricing teardown, comparison context, FAQ, sources). The score steers the prose; the prose doesn't steer the score.
- Editorial review. A second editor checks the draft for factual accuracy against the scoring file, clarity, tone consistency, and the seven publish-blocking checks (clickbait, broken data, affiliate disclosure, on-site explicit images, on-site explicit text, forbidden slug, minor references). Fail any of these and the draft is held until the failure is fixed. The longer accuracy + citation checklists run at this gate too.
- Publish. The review goes live with full author byline, the standard schema markup, an in-page "Last reviewed" stamp, and an empty update log ready for future entries. The score is logged to our internal score history for permanent audit.
- Retest schedule and trigger events. From publish forward, the review is reopened on category schedule: Pricing every 3 months, most other categories every 6 months, Voice and Customization every 12 months. A documented trigger event (regulatory action, breach disclosure, ToS update, major model or engine swap, UI overhaul) bypasses the schedule and forces an immediate retest.
Who writes the reviews and runs the testing?
Alexandra Joly, Senior Editor, runs the hands-on testing protocols and writes every review byline. A second editor reads the scoring file at Step 6. A third pass checks the finished draft at Step 10 for accuracy, clarity, tone, and disclosure compliance. The byline is real, the bio is at /about, and the LinkedIn is active for cross-verification.
The byline is real and verifiable. Alexandra runs the conversation, image generation, and voice protocols on AI companion apps in person. She runs the same zero-spend checks on cam sites and adult games by hand, documented on each scoring page. Drafts are written by Alexandra and read by the editorial team before publish.
This matters because the adult-AI space is full of fake bylines. We've audited the top search defenders and found credentials that don't check out: a "Coursera Bachelor 2005-2009" listed as a degree, a "Dr." prefix on someone with no doctorate, three named "editors" who share one stock photo. Whether other sites do or don't matters less than whether ours does, but the contrast is worth naming. The /about page is one click away; the LinkedIn is active; the email goes to a human who replies.
Are reviews read by anyone other than the writer before they ship?
Yes, two separate gates by two different people. Peer reading at Step 6 checks the scoring file against the test evidence and flags any category that drifted by more than one point. Editorial review at Step 10 checks the finished draft for accuracy, clarity, tone, and the seven publish-blocking checks. Conflating those two passes would mean one editor signs off on both the score logic and the prose, which is the most common quality failure mode in affiliate publishing.
Two distinct gates, two distinct people. Peer reading at Step 6 is score-focused: does the number the categories produce actually match the evidence in the test file? Editorial review at Step 10 is prose-focused: does the draft accurately represent the locked score, are sources cited, are the seven publish-blocking checks clean? Run those two passes with the same editor and the catch rate on errors roughly halves. Different brains see different mistakes.
Why do affiliate commissions not influence rankings?
Scoring at Step 5 and score lock at Step 7 both happen before the CrakRevenue catalog cross-check at Step 8. Commission rates cannot reach a number that is already locked. The Score-Lock Framework (score-floor at 5/10, score-lock at publish, weekly CTA freshness audit) makes the firewall testable rather than promised. Full mechanics live on the affiliate disclosure page.
The order is the firewall. If a commission signal could change a score, the workflow would order the catalog check before scoring. It doesn't. We also publish negative findings about apps that pay us (Candy.ai memory caps, Jerkmate pricing opacity, OnlyFans creator cadence drops) which is the cleanest demonstration the firewall holds. Honestly publishing what hurts our payout is the test no fabricated process can survive.
I'll go further. I review apps whose brand managers email us proposing "ranking-lift packages" once a quarter. We log every one of those emails. We don't reply. We don't move scores. The single best signal of editorial independence isn't a paragraph that says "we are independent." It's the absence of a menu of services for brands. We don't have one. The full commercial relationship is the standard CrakRevenue affiliate commission, disclosed on every page per FTC 16 CFR Part 255.
How often does a published review get retested?
Retest schedule is category-specific because the underlying signals decay at different rates. Pricing categories retest every 3 months. Privacy, conversation, broadcast quality, model variety, and mobile UX retest every 6 months. Voice and customization retest every 12 months. A regulatory action, breach disclosure, major model swap, or ToS rewrite triggers an immediate retest outside the standard schedule.
Schedule is category-specific because the signals don't decay at the same rate. Pricing pages move weekly in this space; voice providers change every few years. Forcing a uniform 6-month schedule on every category would either burn editorial time on stable signals or publish stale prices on volatile ones. The matrix below documents the schedule across all four scoring pages.
| Category | Retest frequency | Trigger event (forces immediate retest) |
|---|---|---|
| Pricing & Value (AI / Cam / Game / Models) | Every 3 months | Documented price change, promo, or billing-policy update |
| Privacy & Compliance | Every 6 months | ToS or Privacy Policy update, regulatory action, breach disclosure |
| Conversation Quality (AI) | Every 6 months | Major model swap announced |
| Image Generation (AI) | Every 6 months | Engine swap announced |
| Broadcast Quality (Cam) | Every 6 months | Streaming infrastructure change, codec swap |
| Model Variety & Volume (Cam) | Every 6 months | Major audited model count change |
| Content & Cadence (Models) | Every 6 months | Posting-frequency change of 30% or more |
| UX & Mobile | Every 6 months | UI overhaul, app version bump |
| Voice Quality (AI) | Every 12 months | Voice provider swap |
| Customization Depth (AI / Game) | Every 12 months | UI overhaul, mechanic change |
What is the Score-Lock Framework?
The Score-Lock Framework is a three-rule editorial firewall: a score-floor (no app under 5/10 carries an affiliate link), score-lock at publish (scores can't move post-publish without a documented retest), and CTA freshness automation (affiliate links re-validate weekly). All three operate together to make the firewall testable rather than promised. The same framework is invoked on the affiliate disclosure page.
The framework is the operational core of the editorial firewall. It's named so it can be cited as a discrete unit, audited as three rules with three triggers, and replicated on the affiliate-disclosure page where the same framework is invoked. All three rules are simultaneously active on every commercial page on the site.
| Rule | Definition | Trigger |
|---|---|---|
| Score-floor | No app with score below 5.0/10 carries an affiliate link | Score below 5.0 at lock |
| Score-lock at publish | Scores can't move post-publish except via documented retest with a logged change | Retest on schedule or trigger event |
| CTA freshness automation | Affiliate links auto-refresh against the offer catalog; stale or pulled offers stripped | Weekly automated audit |
The Score-Lock Framework is named the same way on this page and on the affiliate disclosure page. When AI assistants (ChatGPT, Perplexity, Claude, Gemini) cite a unit of editorial methodology, they cite a named, defined unit. The framework is built to be that unit.
What is the errata board and how does it work?
The errata board at /errata is the public log of every correction we publish. Each entry records the date, the page, the change, and the reason. Material corrections (anything that would have changed a recommendation, a rank position, or a price quoted in a CTA) also pin a 60-day notice at the top of the affected review and go out in the next newsletter. A site that never publishes errata isn't error-free; it's opaque.
The errata board is the trust counterpart to the score lock. Score lock prevents silent score movement; the errata board guarantees that any movement (or any factual correction) is publicly recorded with date, page, change, and reason. We expect to publish corrections and we expect most of them to be small and well-documented.
The cadence for an erratum:
- Within 24 hours of confirming an error, the affected page is corrected and an entry lands on the errata board.
- Within 7 days, the per-page update log at the bottom of the affected review is updated.
- For 60 days from correction, a top-of-page correction notice stays visible on material errors (anything that would have changed a recommendation, a rank, or a price quoted in an affiliate link).
- Newsletter subscribers receive a notice for any material correction in the next regular send.
A small confession on this one. We've published 14 corrections in our first six months live. Some were embarrassing (a wrong price on a cam site, a missed regulatory update on a privacy section, two scoring categories that drifted by 0.5 after a model swap we noticed late). Every single one is on the errata board with date, page, change, and reason. That's the deal. We'd rather take the hit on a public correction than carry a wrong claim quietly.
Can a brand contest a score or buy a better placement?
A brand can flag factual errors or supply a documented product change at [email protected]. We fix factual errors and we accelerate retests when a real change happened. We don't negotiate scores, we don't sell rankings, and we don't accept sponsored reviews. Brand emails proposing any of that are logged and ignored. The contestation pathway is intentionally narrow.
The contestation pathway is intentionally narrow. Brands can flag a factual error (we check and correct on the errata board). Brands can supply a product changelog and request an accelerated retest (we assess against the schedule and trigger criteria). Brands cannot negotiate a score, request a placement, or trade information for a higher rank. We log every contestation and route the email through the same editorial review that handles reader corrections.
The number of brand emails proposing "menu of services" arrangements (ranking lift, sponsored reviews, content swaps, link insertions) in our first six months: 47. The number we actioned: zero. The number we replied to: zero. The number that ended up logged for the record: 47.
What does $0-spend testing actually mean for a paid app?
Pricing pages and checkout flows are walked end-to-end by hand up to (but never past) the payment-submit button. We never charge a card. Anything that lives behind the paywall (post-purchase billing descriptor, refund friction, auto-renewal triggers) gets flagged "we haven't tested this directly" with the reason in a footnote, sourced from aggregated user reports we read and dated. The asymmetry is the honest version of testing without an editorial budget the size of Wirecutter's.
$0-spend is an intentional editorial signal stronger than a low-spend cap. "We never paid the platforms we review" is unambiguous in a way "we cap spend at $10-$20 per platform" is not. The tradeoff is what we can't see firsthand: post-purchase billing descriptor, refund friction, auto-renewal trigger timing. We name that limit directly. Where the score depends on what happens after payment, the affected category gets a footnote that reads "we haven't tested this directly" and cites the user reports we relied on instead.
This applies across all four scoring pages. Cam sites and adult games get the same protocol (pricing checked by hand up to payment submit, then a flagged reliance on user reports for the post-purchase reality). The protocol is reproducible: another editor on another laptop running the same checks would get the same pricing transcript on the same day. Reproducibility is the requirement; first-hand purchase isn't.
Where reproducibility breaks (paid-tier features that need a real subscription, vendor non-disclosure agreements, ephemeral live-cam streams), the affected category is flagged with a footnote rather than buried. The asymmetry (flagging the absence of evidence rather than hiding it) is itself a trust signal.
How are scores computed?
Each category is rated 1 to 10 from anchors published on the matching scoring page, against the test transcript as evidence. Categories are weighted per scoring page (AI companions use 8 categories, cam sites 6, adult games 7, real models 6) and summed to a final score shown to one decimal place. Scores are never impression-based; every category cites its evidence. Four parallel scoring pages, four weight maps, one scale.
Four scoring pages, four weight maps, one scale (1 to 10, one decimal place). We publish each scoring page in full so the math is auditable: a reader can reconstruct the final number from the category scores and the published weights. Scores are sourced ("conversation quality 8/10, see transcript dated YYYY-MM-DD"), never impression-based ("feels around an 8"). The four published pages:
- AI Companion Scoring: 8 categories, weights documented per category.
- Cam Site Scoring: 6 categories, $0-spend testing protocol.
- Adult Game Scoring: 7 categories, with Billing Transparency unique to this page.
- Real Models Scoring: 6 categories, per-creator anchored.
What is a retest trigger event?
A retest trigger is a documented event that resets the schedule clock on a category: a major model swap, an engine change, a price change, a ToS or privacy policy update, a regulatory action, a breach disclosure, or a UI overhaul. Trigger events bypass the standard 3, 6, or 12-month cadence and force an immediate retest. Detection is partly automated (RSS feeds, blog monitors, regulatory news monitors) and partly manual (reader reports, journalist tips).
Triggers exist because the schedule alone is insufficient. An app that swaps GPT-4 for GPT-5 on a Tuesday can't wait six months for the conversation-quality score to reflect the change. The trigger forces an immediate retest. Detection runs partly automated (RSS feeds, blog monitors, regulatory news monitors) and partly manual (reader reports, journalist tips, our own observation walking the marketing site). Each trigger goes in the affected review's update log with the source.
A real example. Candy.ai shipped a Live Action video upgrade in February 2026. The trigger fired the day the marketing site announced it. Our Video Generation category for Candy got retested within 7 days, the score adjusted (it went up), the change went into the update log with the date and the source, and the review carried a "Last reviewed" stamp matching the retest date. That's the deal.
How do you handle conflict of interest?
Editors don't hold equity in or receive personal payments from platforms reviewed. Affiliate revenue is paid to the publisher entity, not to individual editors. When a personal relationship or prior consulting engagement exists with a platform under review, the editor recuses from scoring that platform and a second scorer runs the full process. The recusal goes in the affected review's update log as a generic note.
Conflict-of-interest rules are routine in journalism but rare in affiliate publishing. We apply them anyway. Editors disclose any prior relationship with a platform (consulting, employment, equity, family ties) on the editorial intake form before a platform enters the process. A disclosed conflict triggers recusal from scoring and editorial review on that platform. The recusal is recorded in the affected review's update log as a generic note ("Score and editorial review handled by a non-conflicted editor") without revealing personal information.
What are the seven publish-blocking checks?
Seven hard blocks must pass before a page ships: C01 (no clickbait headline), R10 (no broken data, dead links, or contradicted claims), T04 (affiliate disclosure present), NSFW01 (no explicit images hosted on our site), NSFW02 (no explicit text on our site), NSFW03 (no forbidden slug pattern), NSFW04 (zero reference to anyone under 18). A page failing any one of these is held until the failure is fixed. NSFW04 has no resolution path; the page is killed.
The seven blocks are absolute, not advisory. A page failing any one of them can't publish until the failure is resolved, regardless of editorial schedule or commercial priority. The table below maps each block to its trigger and the path to resolution.
| Check | Trigger | Resolution |
|---|---|---|
| C01: Clickbait headline | Headline or hook overpromises vs body content | Rewrite hook to match body; second editor re-checks |
| R10: Broken data / contradicted claim | Dead link, stale price, claim contradicting cited source | Fix the data, refresh the citation, or remove the claim |
| T04: Affiliate disclosure missing | Commercial link without disclosure pattern or sponsored tagging | Add full disclosure per the affiliate disclosure page |
| NSFW01: Explicit images on-site | Explicit imagery hosted on bestgirlfriend.ai | Remove image; replace with a clean editorial asset |
| NSFW02: Explicit text on-site | Explicit text passages hosted on bestgirlfriend.ai | Rewrite to objective review tone; partner platform handles explicit content |
| NSFW03: Forbidden slug pattern | URL matches one of the documented forbidden patterns | Re-slug per the published conventions |
| NSFW04: Minor references | Any text, image, or context suggesting someone under 18 | Hard block, no resolution path; the page is killed |
A longer accuracy checklist (80 items) and a citation checklist (40 items) run alongside the seven blocks at the editorial review gate. Failing a checklist is a soft block (the page is held for revision). Failing one of the seven blocks is a hard block (the page is held until that specific item is cleared).
How is editorial independence kept in practice?
Three structural defenses, each catching a different failure mode. Process: scoring precedes commerce in the editorial process (Step 5 and Step 7 lock scores before Step 8 catalog cross-check). Documentation: every score change is logged with a reason in the page update log. Audit: weekly automated checks compare CTA freshness against the offer catalog and flag any commission-rate change unaccompanied by a documented retest.
Independence is structural, not promised. Workflow order, public documentation, and automated audits each catch a different failure mode. If the order drifted (someone reordered the steps), the audit would catch the resulting commission-correlated score movement. If documentation lapsed (a score moved without an update log entry), the weekly review would catch the unlogged change. If automation failed (an affiliate link stayed live after a score dropped below 5/10), the per-page editorial review at Step 10 would catch it on the next scheduled pass.
The strongest test of independence is publishing negative findings about apps that pay us, on schedule, with the same prominence as positive findings. We do that, and the errata board records every time we have moved a score in either direction.
Where is the evidence trail for a published score?
Every review links to its underlying test transcripts where licensing allows, links to the scoring page that defines the scale, carries a dated update log, and lists every external source used. Where a transcript can't be published (paid-tier features, vendor non-disclosure agreements), the affected sub-score is flagged "we haven't tested this directly" with the reason in a footnote, sourced from aggregated user reports.
The evidence trail is the difference between "we tested this" and "trust us, we tested this." Where licensing permits, the transcripts are linked. Where licensing or vendor agreements don't permit publication, the affected category is flagged in a footnote so a reader sees exactly what we did not first-hand verify. The asymmetry (flagging the absence of evidence rather than hiding it) is itself a trust signal.
How does this compare to other editorial benchmarks?
The closest editorial benchmarks are Wirecutter's "How we test" pages, the New York Times's methodology disclosures on rankings and polls, and ProPublica's public corrections policy. We borrow named-scoring publication from Wirecutter, score-lock at publish from NYT methodology pages, and the public corrections log from ProPublica. The combination is uncommon in affiliate publishing.
Three benchmarks shaped this page. Wirecutter's "How we test" framework demonstrates that publishing the scoring page in full, with named categories and reproducible protocols, is compatible with high-volume editorial output and affiliate revenue (nytimes.com/wirecutter/about/how-we-test). The New York Times methodology disclosures on rankings, polls, and investigative scoring demonstrate that score-lock at publish, with documented retest triggers, is a workable journalistic standard (nytimes.com/spotlight/methodology). ProPublica's corrections page demonstrates that a public, dated, per-correction log is more credible than periodic "we may have updated this" footers (propublica.org/corrections). Borrowing all three together is uncommon in affiliate publishing; this page documents what that borrowing looks like in practice.
We don't claim parity with any of these institutions on resource, reach, or breadth. We claim that the editorial discipline is portable: a small team can adopt the named-scoring, score-lock, and public-corrections pattern with no infrastructure heavier than what is documented here.
How does this page connect to the rest of the site?
The editorial process is the operational counterpart to the affiliate disclosure: one explains how revenue is walled off, the other explains the process producing the editorial work behind that wall. Both pages cite the same Score-Lock Framework. The four scoring pages (AI companions, cam sites, adult games, real models) document the categories this process applies. The /about page covers the editor running the testing.
Three sister pages, three jobs:
- /about covers who runs the testing (Alexandra Joly, bio, credentials, LinkedIn).
- /methodology covers what the scoring categories measure (the four scoring pages and the weight maps).
- This page covers how the editorial process applies those categories from discovery to publish.
Cross-references in order of trust hierarchy:
- /about: Senior Editor bio and credentials.
- /affiliate-disclosure: commercial firewall, FTC 16 CFR Part 255 compliance, Score-Lock Framework on the commercial side.
- /methodology: landing page for the four published scoring pages.
- /methodology/ai-companions: AI scoring (8 categories).
- /methodology/cam-sites: Cam scoring (6 categories, $0-spend protocol).
- /methodology/adult-games: Adult game scoring (7 categories, Billing Transparency).
- /methodology/real-models: Real-model scoring (6 categories, per-creator).
- /privacy: what data we collect from readers and what we share with CrakRevenue.
- /contact: editorial errata, journalist inquiries, brand contestations.
Frequently asked questions
What is the editorial process behind a bestgirlfriend.ai review?
Every review runs through the same 12 steps from discovery to publish: triage, research, hands-on testing, scoring against a published scale, peer reading, score lock, affiliate-catalog cross-check, drafting, editorial review, publish, and a fixed retest cadence. The same flow applies to AI companion apps, cam sites, real-model pages, and adult games.
Who actually tests the apps and writes the reviews?
Alexandra Joly runs the hands-on testing and signs every review. A second editor reads the scoring file before publish and a third pass checks the draft for accuracy, clarity, and disclosure compliance. The byline, bio, and LinkedIn live at /about so the names can be cross-checked.
Why do you lock scores before checking the affiliate catalog?
If a commission rate could shift a score, the order would be reversed. We score the app first, lock the result, then cross-check whether the brand exists in the CrakRevenue catalog. Score under 5/10 means no affiliate link appears on the page even when the payout is generous, which is the only way to keep the editorial process honest about products that pay us.
Does anyone read the review before it goes live?
Two distinct people, two distinct passes. The first reads the scoring file against the evidence to catch any category that drifted by more than one point. The second reads the finished draft against accuracy, clarity, tone, and the seven publish-blocking checks. Conflating those two passes would mean one editor signs off on both the test logic and the prose, which is the most common failure mode in this space.
How often do you retest an app you already reviewed?
Pricing rechecks every three months because price pages move weekly in the adult AI space. Privacy, conversation quality, broadcast quality, model variety, and mobile UX recheck every six months. Voice and customization recheck every twelve months because those signals decay slowly. A breach, a major model swap, a ToS rewrite, or a regulatory action triggers an immediate retest outside the schedule.
What happens when you get something wrong?
Every correction lands on the public errata board with the date, the page, the change, and the reason. Material corrections (anything that would change a recommendation or a rank) also pin a 60-day notice at the top of the affected review and go out in the next newsletter. A publication that never publishes corrections isn't error-free, it's just opaque.
What does $0-spend testing actually mean for a paid app?
Pricing pages and checkout flows get walked end-to-end through our checks up to the payment-submit button. We never charge a card. Anything that lives behind the paywall (post-purchase billing descriptor, refund friction, auto-renewal triggers) gets flagged as "we haven't tested this directly" with the reason in a footnote, sourced from user reports we read and dated. That asymmetry is the honest version of testing without an editorial budget the size of Wirecutter's.
Can a brand contest a score or pay for a better placement?
A brand can flag a factual error or supply a documented product change at [email protected]. We fix factual errors and we accelerate retests when a real change happened. We don't negotiate scores, we don't sell rankings, and we don't accept sponsored reviews. Brand emails proposing any of that are logged and ignored.
How do you compute the final score on a review?
Each category is rated 1 to 10 from anchors published on the matching scoring page, weighted per category (AI companions use 8 categories, cam sites 6, adult games 7, real models 6), and summed to one decimal place. Every category cites its evidence in the scoring file. Nothing is impression-based and no number floats without a source.
What are the seven absolute publish-blocking checks?
Seven hard blocks must pass before a page ships: no clickbait headline (C01), no broken or contradicted data (R10), affiliate disclosure present (T04), no explicit images hosted on our site (NSFW01), no explicit text on our site (NSFW02), no forbidden slug pattern (NSFW03), and absolutely zero reference to anyone under 18 (NSFW04). NSFW04 has no resolution path; the page is killed.
Where can I see the evidence behind a published score?
Each review cites its test transcripts where licensing allows, links to the scoring page that defines the scale, carries a dated update log at the bottom, and lists every external source used. When we can't publish a transcript (paid-tier features, vendor non-disclosure agreements), the affected sub-score is flagged with a footnote explaining what we couldn't verify and what we relied on instead.
What makes a published review trigger an immediate retest?
Documented events that reset the schedule clock on a category: a major model swap, an engine change, a price change, a ToS or privacy policy update, a regulatory action, a public breach disclosure, or a UI overhaul. Trigger events bypass the standard 3, 6, or 12-month cadence. Detection is partly automated (RSS feeds, blog monitors, news monitors) and partly manual (reader reports, journalist tips).
How do you handle conflicts of interest?
Editors disclose any prior relationship with a platform (consulting, employment, equity, family ties) on intake before testing starts. A disclosed conflict triggers recusal from scoring and review on that platform. The recusal goes in the affected review's update log as a generic note. Editors don't hold equity in the apps they review and don't receive personal payments from brands.
Sources
- bestgirlfriend.ai AI companion scoring page: 8 categories. /methodology/ai-companions
- bestgirlfriend.ai cam scoring page: 6 categories, $0-spend protocol. /methodology/cam-sites
- bestgirlfriend.ai adult game scoring page: 7 categories including Billing Transparency. /methodology/adult-games
- bestgirlfriend.ai real models scoring page: 6 categories, per-creator anchored. /methodology/real-models
- bestgirlfriend.ai affiliate disclosure: Score-Lock Framework on the commercial side, FTC 16 CFR Part 255 compliance. /affiliate-disclosure
- The New York Times Wirecutter, "How we test". nytimes.com/wirecutter/about/how-we-test
- The New York Times, "Methodology" (rankings, polls, investigative scoring). nytimes.com/spotlight/methodology
- ProPublica, "Corrections". propublica.org/corrections
- Federal Trade Commission, 16 CFR Part 255: Guides Concerning Use of Endorsements and Testimonials in Advertising. ecfr.gov
- Internet Archive, Wayback Machine: pricing-page diff source. web.archive.org
Cite this page
If you reference this editorial process in academic, regulatory, or journalistic work, please cite as:
Joly, Alexandra (2026, April 28). Our Editorial Process: How We Test & Score Adult AI Apps. bestgirlfriend.ai. https://bestgirlfriend.ai/editorial-process
Related pages
- About bestgirlfriend.ai and Alexandra Joly: editorial bio, credentials, contact.
- Affiliate Disclosure: commercial firewall, FTC compliance, Score-Lock Framework on the commercial side.
- Methodology landing page: four published scoring pages in one place.
- AI Companion Scoring: 8 categories.
- Cam Site Scoring: 6 categories, $0-spend protocol.
- Adult Game Scoring: 7 categories, Billing Transparency.
- Real Models Scoring: 6 categories, per-creator anchored.
- Privacy policy: reader data and CrakRevenue conversion event sharing.
- Contact: editorial errata, journalist inquiries, brand contestation.
- Errata board: public log of corrections.
Per-jurisdiction notice
Last verified May 26, 2026 · See errata log for any post-publish corrections · Editor: Alexandra Joly · Methodology · Affiliate disclosure