№ XLV·On Method·18 December 2026

How we read a corpus.

A studio note on what happens when a single editor sits down with a small brand's entire review history and treats it as a manuscript. The exercise looks like literary criticism. The output is closer to a research brief than to a dashboard.

BetterReviews Editorial·Studio note

CONTENTS · 07

01The first pass is not analysis. It is reading.
02The second pass is the surprise file
03The third pass groups by claim, not by sentiment
04The fourth pass is for words that recur
05What no software did
06The fourteen-page memo
07The studio note, plainly

The studio works with a small skincare brand in Copenhagen. It has been trading for four years. It has, at the time of writing, 3,412 reviews across Shopify, Junip, and the Yotpo migration it half-finished in 2024 and never undid. The reviews live in three places and have never been read in one place by one person.

An editor sat down with the full corpus in November 2026. The exercise took eleven hours across three days. The deliverable was a 14-page memo. The memo cost less than a single month of Yotpo Plus. The brand changed nine product pages, three email flows, and the language in the founder's About page on the basis of it.

This is a note on what the editor actually did. It is not a methodology paper. It is closer to a studio diary.

The first pass is not analysis. It is reading.

Day one was a single posture: open a document, paste the reviews in order, read every one from the first to the last. No highlighter. No tagging. No sentiment score. The editor read 3,412 reviews the way a literary editor reads a manuscript, which is to say slowly, in order, with the intent of meeting the writers.

The first three hours produced no notes. This is correct. A manuscript editor does not annotate on the first pass. They are establishing a baseline for what the writing sounds like, who is writing it, what they are returning to.

By hour four the editor began to feel the shape of the corpus. There were two registers in the writing. There was a clinical register from buyers in their late forties and older, talking about specific skin conditions and named ingredients. There was a tactile register from buyers in their twenties and thirties, talking about texture, smell, and "how the bottle feels in the hand." The brand's own marketing was written entirely in the tactile register. Half its writing customers were being addressed in the wrong language.

The second pass started on day two.

The second pass is the surprise file

The editor opened a second document called `surprises.md` and reread the corpus, copying any sentence that surprised her into it. The criterion for surprise was personal. A surprising sentence was one she had not expected, did not know how to interpret immediately, or that contradicted something the brand's marketing claimed.

By the end of day two the surprise file had 84 sentences in it. Some examples.

"I bought this for my mother and she has been using it for six months. I have used three pumps total. I do not know if it works."

"The smell reminds me of the soap at a hotel in Lisbon I went to twice. I have no other comparison."

"It is the only serum I have used that I have actively been embarrassed to buy again. I keep buying it."

The brand's marketing copy did not contain any sentences like these. The marketing copy contained, instead, claims about hydration percentages, ingredient lists, and a hero image of the founder. The surprise file was where the actual writing was.

A surprise file is the cheapest research instrument the studio has found. It costs an editor's attention and a plain text document. It produces, in our experience, the best raw material for any subsequent content decision the brand will make for the next eighteen months. This is what an editor would do with a corpus looks like in practice. There is no model behind it. There is a person reading, surprised.

The third pass groups by claim, not by sentiment

Day three began with the editor opening a third document called `claims.md` and rereading only the surprise file. Each sentence was placed under a claim it implicitly made. A claim is the assertion the sentence is evidence for, even if the sentence does not state it.

The Lisbon hotel sentence was filed under "the scent triggers specific autobiographical memories." The mother sentence was filed under "buyers gift this product and never use it themselves." The embarrassment sentence was filed under "the visual design of the bottle is socially expensive to display." By the end of day three there were seventeen distinct claims, each supported by between three and eleven cited sentences from the corpus.

Claims are not sentiment. A sentiment score would have classified all three sentences above as either neutral or mildly positive. The claims they support are sharper, weirder, and more useful. "Buyers gift this product and never use it themselves" is, for a brand that has been measuring repurchase rate by buyer-account, a load-bearing finding. The buyer is not the user. The brand had been optimising for the wrong person.

Sentiment analysis cannot find this. A taxonomy of star ratings cannot find this. Only reading, slowly, with the intent of locating what claim each sentence is quietly making, will find it. This is the editorial posture the studio defends.

The fourth pass is for words that recur

The fourth document, opened on day three, was a list of words and short phrases that recurred in the corpus and were absent from the brand's marketing. The editor was not interested in obvious recurrences (the product names, the ingredient list, the brand name). She was interested in the words the brand had not yet noticed it was being described by.

The recurring words were, in order of frequency: "weird" (47 occurrences), "slow" (39), "expensive but" (31), "winter" (29), "I forgot" (24), "doesn't work on" (22). None of these words appeared anywhere in the brand's marketing copy. None had been used in any email flow. None had been searched for, internally, as a content angle.

The "weird" recurrences are particularly instructive. Buyers were calling the scent weird and meaning it as praise. The brand had spent two years describing the scent as "refined" and "subtle." The brand's own customers had a better word for it. The brand was not using their word. The fix is the smallest content change imaginable, costing nothing, and the brand was four years late to it.

This pass is the practical work of treating reviews as language. A stockroom audit asks how many. A language audit asks which words. The two audits look identical from a distance and produce completely different recommendations.

What no software did

At no point in the eleven hours did the editor open a dashboard. There was no sentiment score, no rating distribution, no word cloud. The studio's tools are, deliberately, the cheapest available: three plain text documents and an open mind. A spreadsheet was used twice, to count occurrences of the recurring words, and only because counting is faster than estimating.

The work that the studio sells is not the tools. It is the posture. The tools are what make the posture cheap to repeat across thousands of brands, where today it costs eleven editor-hours and a calendar week. The studio is building the software that makes the first three passes (read in order, surprise file, claims file) feel like a single afternoon.

There is no model for this in the existing review-platform category. Yotpo's analytics suite, Okendo's insights tab, Bazaarvoice's intelligence product. None of them produce a surprise file. They produce a sentiment chart. The sentiment chart is, in our experience, the least useful artifact in the category. It tells the brand what it already knew (most reviews are 5-star, most are short, most are positive) and tells it nothing about what to write next.

The surprise file tells the brand what to write next. That is the asymmetry the studio is built around.

The fourteen-page memo

The deliverable to the brand was a 14-page memo in long-form prose. It opened with the registers (clinical vs tactile, and the brand's marketing addressed only one of them). It contained the seventeen claims, each with three to eleven supporting sentences. It contained the six recurring words and the proposed marketing edits for each. It ended with nine specific product-page changes and a recommendation to rewrite the founder's About page in the clinical register, where the brand's older buyers actually live.

The memo was 4,200 words long. It quoted 81 customer sentences in full. It did not contain a single chart. It read like a research brief. The founder read it on a Sunday and called on Monday with three follow-up questions, all of them about specific sentences. None of them were about aggregate numbers.

This is the load-bearing claim: a corpus of customer writing is a manuscript. The output of reading it well is not a dashboard. It is a memo. The memo will change what the brand writes next. That, in turn, is the sentences your customers wrote becoming the brand's actual marketing copy. Which is, finally, the dead text waking up.

The studio note, plainly

We are building software around this exercise. Not software that replaces the editor's reading (it cannot, and we do not want it to). Software that makes the first read cheaper, the surprise file faster to populate, the claims taxonomy reusable across the brand's full history. The editor will still read. The output will still be a memo. We will have made it possible to produce that memo for a brand of any size in a few hours instead of a week.

The work was always reading. We are making reading cheap.

If any of this reads like something your store could use,write to us.

We will write back.

Corrections

corrections@better-reviews.com

Mistakes are listed at the foot of the page when found.