Behind the Scenes of a Romance Scam Operation

A rare inside look at a pig butchering operation, using data taken straight from a compromised server the scammers are still operating on.

user32

~17 min read · April 19, 2026 (Updated: April 19, 2026) · Free: Yes

Most people who own a phone in North America have at some point received a text from a stranger who seems to have the wrong number. The sender usually introduces herself by a first name, apologizes for the mixup, and then asks a mildly personal question that presumes some prior acquaintance. The area code looks like it might be somewhere familiar. If the recipient writes back to say she has the wrong person, she keeps the conversation going anyway.

What feels like a misdirected text is, in practice, the first step of a very long funnel. On the other end of it sits a workflow tool, a database of target numbers, a bank of disposable VoIP accounts, and a human operator who is probably managing several conversations at the same time. The first text is not actually intended to catch the recipient that day. Its job is to identify the one person in several thousand who will reply and then stay in the conversation long enough to eventually be walked onto a fake trading platform and lose a significant amount of money over the following weeks.

In the security literature the pattern has a name: pig butchering, a rough translation of the Chinese term shā zhū pán. The metaphor refers to the slow process of fattening the victim up with affection and small talk before the slaughter. Different agencies produce different estimates, but the consensus figure for global losses now sits somewhere in the tens of billions of dollars a year. The United Nations has also gone on record describing the labor side of the industry as a humanitarian crisis in its own right, referring to the walled compounds in Southeast Asia where trafficked workers are forced to sit and type these conversations under duress.

Public coverage of pig butchering generally focuses on either the individual victims or on those compounds. The layer in the middle, meaning the actual infrastructure that sits between the person who sends the first text and the person who eventually sends the last wire, rarely gets written about in any depth. I was able to compromise the server of the people running one of these operations and pull their database. What follows is what that infrastructure looks like from inside.

All identifying information, including IP addresses, domains, internal data store names, vendor names, and code-level specifics, has been removed from this writeup. The point here is to describe the shape of the operation, not to hand out a map back to it. If you are a journalist, researcher, or investigator who has a legitimate reason to look at the underlying data, I am happy to share what I have, and you are welcome to reach out through the contact information on this site.

The scale of one operation.

What the data actually is

Two snapshots of the operator-side server were captured, one in January and one in April. Between them they contain about 137,000 sender accounts. Each one of those accounts is a disposable U.S. or Canadian phone number registered to one of the well-known "free texting" apps that a lot of people have installed on their phones, including TextFree, TextPlus, and TextNow. The April snapshot alone holds roughly 1.56 million messages, sent to 888,116 distinct target phones. Of those 888,116 targeted phones, 38,909 of them wrote back to the scammer at least once.

Those four figures already outline most of what the rest of the data confirms later on. The operation is producing roughly one reply for every twenty-three phones it contacts, and every other piece of infrastructure described in this article, including the proxies, the fake iPhones, the account-creation pipelines, the templates, and the people behind the templates, exists to keep those figures moving at that ratio.

The operation runs in bursts

The first thing that stands out when the message timestamps are laid out across a calendar is that the operation is not really "on" most of the time. Months of message traffic compresses into three sharp bursts of activity separated by long quiet stretches.

Daily outbound messages across the months

The first burst ran from late December through the second week of January and pushed out roughly 1.2 million outbound messages in sixteen days. The second one, in late February and early March, was much smaller, with about 47,000 messages over nineteen days, which looks more like a warm-up or a test than a real campaign. The third burst took up the final ten days of March into early April and produced a little over 600,000 messages.

Two things are worth noting about this pattern. The first is that a campaign-driven shape like this tells you something about how the operation is being staffed. You do not spin up seventy thousand new sender accounts in two weeks and then let the infrastructure go mostly idle for a month unless the team behind it is coming and going, or unless the operators are releasing the product in waves on purpose. The second is that the shape of the activity implies somebody is making a deliberate decision about when to run the machine. A fully automated spam operation would tend to produce flat, continuous background noise in the traffic data, whereas this one produces something closer to a scheduled release pattern, with specific windows of activity followed by long stretches of nothing.

One day, the targeting moved to Canada

The single most interesting moment in the whole dataset is January 2. Up until that day, somewhere around 98 percent of the outbound traffic was going to U.S. area codes, and the Canadian numbers were getting only a few hundred messages a day. On January 2, Canadian outbound volume jumps from about 1,349 messages the day before to over 35,000 in the same twenty-four-hour window, while U.S. volume barely moves. By the following day, Canada is getting almost four times the volume of the U.S., and it stays that way for the rest of the burst.

The January 2 pivot.

A twenty-six-fold overnight jump is not organic drift. It is the signature of an operator loading a fresh list of target numbers and pointing the system at them. The operator-side campaign scheduler backs this up. There are effectively no scheduled campaigns in the system before January 9, and several hundred of them appear on January 9 itself, which suggests that the early January traffic was pushed through a simpler, more manual send path, and that the scheduler came online midway through the burst once the volume kept climbing.

The more interesting question is why the operators chose Canada for the second half of the burst in the first place. The answer sits in the language breakdown.

The Chinese-language messages were aimed at specific regions of Canada

The outbound messages to the U.S. are overwhelmingly English, at around 96 percent. The outbound messages to Canada are only 60 percent English. The remaining 34 percent are written in Chinese.

That Chinese share is not evenly distributed across Canadian area codes. It is heavily concentrated in two provinces in particular. In British Columbia area codes, 87 percent of the outbound message text contains Chinese characters. In Ontario, the figure is 72 percent. In Quebec it drops to 29 percent, and in Alberta to 27 percent.

Chinese-language targeting by province.

Numbers that lopsided are not the product of random dialing. The operators, or whoever sold them the target lists, clearly knew which Canadian area codes were densely populated by the communities they were trying to reach. The English-language "need a job" blasts went out broadly across the country, while the Chinese-language conversational openers were directed at the specific provinces where a Chinese-speaking recipient was most likely to pick up the phone.

That pre-sorting is a large part of what "industrial scale" actually means here. The volume is a factor, but the more important feature is that the messages themselves have already been categorized by language and region before the operator on the other end ever opens the workflow tool.

The opener matters more than almost anything else

The operation maintains a small library of message templates. Each campaign picks one, sprays it against a batch of target numbers, and then watches which targets reply. Joining the template library to the resulting conversations produces a reply rate by template, and the gap between templates turns out to be enormous.

Reply rate by template.

The "i miss you" opener has a reply rate of 17.8 percent. This is not the share of repliers who eventually converted; it is the share of recipients who wrote something back at all. The next several templates cluster between six and eight percent, including "can i ask you a question," "do you have time to eat tonight," "i heard you went on a trip," and "are you busy, why haven't you been contacting me." The explicit job pitch at the bottom of the table sits at five.

There is a specific psychological reflex being exploited at the top of this chart. If a recipient believes that somebody has texted them by mistake, and if that somebody seems to think the recipient is a person they care about, the recipient tends to feel a small social obligation to correct the sender. "I miss you" is the maximally effective opener because it makes the recipient feel a little guilty about not replying. The templates near the top of the chart all share that same structure. They sound like a real, slightly anxious message from a real person, sent to the wrong person.

The explicit job pitches like "do u need a job?" and its cousins are doing different work. They are not trying to produce social obligation. They are blunt cold offers aimed at people who are already in a vulnerable enough spot to click through on a text about easy money. Those people exist, and the operators hit them too, but the yield from that approach is about a third of the yield from the "wrong number" flow.

Almost no one who replies actually gets pig-butchered

The engagement funnel is where the real scale of the operation becomes visible, and where it becomes clear how thin the per-message economics really are.

The engagement funnel.

Of the roughly 1.08 million conversations the operation initiated in the April snapshot, only 38,909 produced even one reply. That is 3.6 percent. Only 21,700 made it past five messages exchanged, about 2 percent of the top of the funnel. Conversations that reached ten messages totaled 6,580. Twenty messages, 1,703. Fifty or more messages, which is roughly the point where a pig-butchering conversation starts to look like a relationship, came out to 219.

Only 219 conversations out of more than a million reached that final stage, which is what the entire apparatus actually exists to produce. The economics only work because the average loss per successful engagement runs into the tens or hundreds of thousands of dollars. You can collapse the funnel from a million down to two hundred and still be running a very profitable business.

Inside that last bucket, the conversations look very different from the rest of the dataset. They tend to be long, often running across multiple days, and they move naturally through small talk, holidays, family, work, and, over and over, the target volunteering financial or health details without being directly asked. The scammer maintains a consistent persona throughout each of these longer threads, usually presenting as a woman of some specific background running a specific kind of business in a specific city. The persona stays stable across the length of the conversation, which means that either a single human operator is managing the entire thread themselves, or the operators are using some kind of shared CRM-style notes system to keep the story straight across shift changes. The data does not let you distinguish between those two possibilities, but either answer is interesting in its own right.

How they manufacture phone numbers by the thousand

The operation ran against the free texting apps because those apps let anyone create a working U.S. or Canadian phone number without a SIM card, and do it from an app install alone. The free texting apps are, understandably, aware of being abused this way, and they aggressively block accounts that look wrong, which includes data center IPs, emulators, unusual device fingerprints, and accounts created too fast from the same place.

The operation's answer is a three-part pipeline, assembled out of off-the-shelf pieces.

How one disposable phone number gets created.

The first piece is a residential proxy provider. Residential proxies are IP addresses that appear to be coming from home broadband connections, which means they look, from the carrier's perspective, indistinguishable from a real customer sitting on their couch. The operator-side inventory contains 2,252 configured proxy sessions. Of those, 2,002, or almost 89 percent, route through one single residential-proxy account. Roughly 1,984 of those share the same username base, which means they are sub-sessions of one account at one vendor.

Proxy infrastructure concentration.

That concentration is probably the most investigatively relevant finding in the whole dataset. If the goal is to disrupt the operation's ability to appear to be thousands of different American and Canadian home internet connections, the required cooperation comes from one single company, rather than from a long list of proxy providers.

The second piece is fake iOS device metadata. When the account creation script calls the free-texting API, it pretends to be an iPhone by reporting an iOS version, a device model, and an advertising ID. Looking at the distribution of reported iOS versions across the account fleet reveals something strange. The top twenty-five reported iOS versions each carry between roughly 6,800 and 7,900 accounts, which works out to a band of less than 16 percent across two dozen different version strings. Real iPhone populations do not distribute that way, because they tend to cluster on a small number of recent versions as most users update their devices. A near-flat distribution across two dozen iOS strings is the pattern you would expect to see when a script is picking a version at random out of a lookup table in order to make the account fleet look organically diverse, which strongly suggests the metadata is synthetic.

The third piece is account-creation automation. This is where most of the disposable phone numbers actually come from. A proxy session is selected, a synthetic device fingerprint is generated, a new account is registered against the free-texting app, the assigned phone number is captured and written to the operator's database, and the account is pushed into the rotation of available sender numbers. The operator never handles any of this by hand.

The combination of all three pieces is what makes the pipeline work. None of the individual components are particularly unusual on their own. Residential proxies are a commodity product, mobile device spoofing has been documented in essentially every mobile-fraud writeup of the last five years, and account-creation automation has existed for as long as consumer APIs have. What separates a serious operation from a hobbyist project is how well those three components have been integrated with each other, and how reliably the integration holds up once it is running at scale.

The account fleet is designed to die

Most of the sender accounts in the data barely last a single day.

Account lifespan distribution

Roughly 72.6 percent of U.S.-targeting accounts and 79.2 percent of Canada-targeting accounts have a total lifespan, measured from first outbound message to last outbound message, of less than twenty-four hours. Only a tiny fraction of accounts survive past a week. The Canadian side burned out even faster than the U.S. side, presumably because Canadian carriers and the free-texting platform's anti-spam systems caught on more quickly to the pattern.

The operators absorb that loss rate by creating new accounts faster than the carrier can suspend them. Across the December to January burst, the operation created roughly 47,000 new sender accounts while only holding 1,000 to 3,600 of them active on any given day. The carrier was suspending accounts at something close to the rate the operation was creating them, but the operation effectively did not care, because if an account manages to send its payload and then dies within the same day, the operators have already gotten the only value they wanted from it. The accounts are designed to be disposable from the beginning.

This is the part of the system that is hardest to disrupt from the defender's side. Account-level suspension, the standard response, is a race the carrier cannot win at this volume. To actually slow the operation down, the defender would have to close the pipeline instead of individual accounts, which would mean choking off the proxy infrastructure, poisoning the device-fingerprint lookup, or pushing the free-texting platform to require a real SIM-verified phone before letting an account send at volume. None of those are easy. The residential proxy angle is the only one that targets a concentrated chokepoint visible in the data.

The operator accounts are not the operators

The server lists 25,450 "operator accounts," meaning logins that a human is theoretically using to push the thing around. It is tempting to read that as "25,000 scammers," but that reading is almost certainly wrong, and it is worth being honest about what the data can and cannot tell you here.

Operator account structure.

The operator accounts cluster into 1,194 distinct team identifiers. Within 1,079 of those teams, about 90 percent, every sub-account shares exactly the same password. That is not the pattern you get from 25,000 different humans each picking a password for themselves. It looks much more like a small number of humans registering batches of accounts under a single shared credential, in the same way a team might create role logins for a software seat they share. Only 115 teams show any password diversity across their sub-accounts at all. That is the pattern you might expect to see if multiple humans were using those sub-accounts independently, although even that is not proof, because a single person can use multiple passwords and a team can rotate credentials over time.

What the data actually supports is a weaker but more defensible claim, which is that the ratio of accounts to humans is nothing like one-to-one. The 25,450 figure is a fleet size rather than a headcount, and the actual number of people behind the operation is not directly observable from the server itself. It would be irresponsible to try to put a specific number on the human side of the operation from this data alone. The more useful takeaway is that the operation is deliberately engineered to make one person appear to be thousands of different senders at once, which is the part that actually matters for understanding how it scales.

Linguistic breadcrumbs

One of the quieter things in the data is that the operators did not especially try to hide the environment they built the system in. File and variable names inside their environment are in two languages. Operator-facing Chinese shows up throughout the customer-facing templates, which is expected. On the system-configuration side, meaning the parts built for internal use and the dashboards and JSON exports the operators pass around between themselves, some filenames are in romanized Vietnamese. Labels translating roughly to "still alive," "already dead," "create new," and "account recovery storage" all appear as names on internal data buckets, rendered phonetically in Latin characters.

This does not mean that the people sending the texts are Vietnamese, since the workers typing the conversations could be almost anywhere. What it does suggest is that the environment the operators use to manage the system was built, or at least configured, by somebody working in Vietnamese, which is a separate question from what language the customer-facing messages are written in. In effect, the software layer and the content layer appear to have been developed in two different linguistic contexts. That split is not proof of anything on its own, since operations get sold, forked, and moved around over time, but it is the kind of artifact that could matter if investigators ever tried to chase the infrastructure upstream.

What this all says about enforcement

The thing that kept striking me as I worked through the data is how tractable the choke points actually are, and how little any of them get touched in practice.

One residential proxy provider moves 89 percent of this operation's internet traffic. A single subpoena, or even a serious abuse notification that resulted in that account being closed and the billing details frozen for review, would disrupt the operation immediately. The traffic is not being spread across dozens of alternative providers for redundancy. Almost all of it passes through a single vendor account.

The operation runs on a small handful of free-texting platforms that all share the same general pattern. They allow bulk automated account creation from a non-mobile device as long as the device appears to be a real iPhone on real home broadband. Requiring a live SIM-verified phone before an account is allowed to send at volume would not meaningfully affect any real user, and it would end this specific pipeline.

The operation sustains itself because the cost of creating a new sender account is close to zero and the cost of having one suspended is also close to zero. The economics only hold as long as that balance does. Raising the creation cost even modestly breaks them.

A striking point the dataset makes is that the operation is not continuous in the first place. The months of data compress down to three discrete bursts of activity. Defenders would not have to match the operation's uptime across the whole year in order to cripple it, they would only have to identify the rhythm the operators run on and be present during those specific windows.

The shape, not the map

The standard story told about pig butchering operations tends to focus on individual villains, meaning a specific cartel, a specific set of compounds, or a specific person who can eventually be arrested. The data argues for a different framing. What this operation looks like in practice is much closer to a platform than to a criminal organization in the traditional sense. It is a pipeline built out of bought or stolen phone lists, a set of standard templates, a residential proxy account, a device-spoofing script, a free-texting API, and a shared workflow tool for the humans who sit at the other end of those systems. Any of those individual pieces could be swapped out for an equivalent piece from a different vendor within a week, and the operation as a whole would barely register the change. The dollars being extracted from victims are enormous, but the infrastructure producing them is ordinary commodity tooling, and the number of humans required to run it is whatever the operators need it to be at any given time.

That is also why the wrong-number texts have not stopped, and why they are not going to stop on their own. The reason they keep arriving has less to do with the people running this particular operation being unusually skilled, and more to do with the fact that the plumbing they rely on is cheap, anonymous, and entirely standard, and that not enough pressure is being applied to the companies selling that plumbing to make them care about who the buyers actually are.

The 888,116 targeted phones in the April snapshot are one part of the evidence for all of this. The 219 long-form conversations buried inside the funnel are the other. Every one of those 219 conversations represents a real person who believed, for at least some period of time, that somebody had genuinely been trying to reach them.

This article is based on a review of two server snapshots drawn from a live operation's own infrastructure. All operator-identifying information, including network addresses, vendor account names, internal data store names, hostnames, and project identifiers, has been redacted. Aggregate statistics are drawn directly from the data.

#security #privacy #analytics #crime #romance

< Go to the original