The Federal Communications Commission reportedly will vote sometime in December to repeal the Title II “net neutrality” restrictions imposed on Internet service providers by the Obama-era FCC.
In preparation for that freighted vote – which activists are heralding with protests outside FCC Chairman Ajit Pai’s home, addressing his minor children by name and calling their dad a “murderer” – Web-content analysts have posted competing data summaries on the comments submitted to the FCC for and against repeal.
I have a friend that lives near @AjitPaiFCC. Net neutrality "activists" posted these signs, featuring his children's names, outside his house. Pizzas also reportedly sent to his house every half-hour last night. pic.twitter.com/jWI4gV6Hvc
— Brendan Bordelon (@BrendanBordelon) November 25, 2017
In one corner, we have Jeff Kao at Hackernoon, who reports that 1.3 million pro-repeal comments (out of some 22 million total, for and against) were “likely faked.”
Kao summarizes his thesis thus (footnotes map to entries at the end of his original post):
My research found at least 1.3 million fake pro-repeal comments, with suspicions about many more. In fact, the sum of fake pro-repeal comments in the proceeding may number in the millions. In this post, I will point out one particularly egregious spambot submission, make the case that there are likely many more pro-repeal spambots yet to be confirmed, and estimate the public position on net neutrality in the “organic” public submissions.¹
He looks, via language analytics, at numerous, similar pro-repeal comments and finds that their stilted syntax and repetitive sentiments appear to be auto-generated.
Each sentence in the faked comments looks like it was generated by a computer program. A mail merge swapped in a synonym for each term to generate unique-sounding comments.¹⁰ It was like mad-libs, except for astroturf.
When laying just five of these side-by-side with highlighting, as above, it’s clear that there’s something fishy going on. But when the comments are scattered among 22+ million, often with vastly different wordings between comment pairs, I can see how it’s hard to catch. Semantic clustering techniques, and not typical string-matching techniques, did a great job at nabbing these.
This next paragraph is arresting, although not for the reason Kao propounds:
Finally, it was particularly chilling to see these spam comments all in one place, as they are exactly the type of policy arguments and language you expect to see in industry comments on the proposed repeal¹¹, or, these days, in the FCC Commissioner’s own statements lauding the repeal.¹²
On the face of it, Kao’s analysis of the individual submissions seems to have some merit. The language is stilted and the sentiments repetitive.
But it is actually the opposite of chilling to see a lot of comments “in one place” reflecting a particular point of view on the topic people have been asked to submit comments on. It is, rather, a sign of a normal process occurring, in the way we would expect it to.
Kao doesn’t bolster his argument by pointing out what feels chilling to him. He instead seems to clue us in that he started with a prior bias.
That said, I wouldn’t dispute that the sample he gives us has a bot-like feel to it. I do think his follow-on case that 99% of “organic” comments favor keeping the net neutrality restrictions is substantially weaker. Feel free to investigate his case for yourself.
In the other corner, we have the data-analytics company Emprata, which posted an analytical package finding that there appears to be use of bot comments from both sides of the net neutrality question, but that the great preponderance of comments from “fake mail generator” addresses came in against repeal of net neutrality.
To put it simply, up front, Jeff Kao focused on language clues and evidence of mail-merging. Emprata, while it did judge comments based on language clues as a means of determining whether comments were unique and personally generated, focused on the extent to which comments came from originators with verifiable and trackable identities.
Emprata found that there were far more comments urging against repeal that came from the temporary/disposable addresses generated by fake-mail generator sources.
Data Completeness: More than 81% of the total docket contained complete (i.e. usable) street address, city, state, ZIP code, and email information. 98% of comments in favor of the repeal of Title II contained usable data versus 70% of comments against the repeal of Title II. In addition, based on a 65% sampling of addresses, 84% of addresses for repeal of Title II were found to be valid versus 68% against repeal.
Artificial Email Domains: More than 7.75 million comments – the largest percentage of any set of comments (36% of the total comments) – appear to have been generated by self-described “temporary” and “disposable” email domains attributed to FakeMailGenerator.com and with nearly identical language. Virtually all of those comments oppose repealing Title II. Assuming that comments submitted from these email domains are illegitimate, [overall] sentiment favors repeal of Title II (61% for, 38% against).
Emprata found that over 99% of comments from international addresses were against repeal:
International Comments: An unusually large volume of comments (1.72 million [again, out of about 22 million]) are attributed to international addresses, which we did not verify. The vast majority of those comments (99.4%) oppose repealing Title II.
Their analysis of duplicative comments also indicated a correlation to being against repeal of the net neutrality restrictions.
Duplicative Comments: 9.93 million comments were filed from submissions listing the same physical address and email, indicating that many entities filed multiple comments. This was more prevalent in comments against repeal of Title II (accounting for 82% of the total duplicates), with a majority of duplicate comments associated with email domains from FakeMailGenerator.com.
The bottom line finding from the Emprata survey is shown in this graphic. If the “fake mail generator” comments are removed from the data set, comments favor repeal by about 61.3% to 38.1%.
Some of the findings by Kao and Emprata were the same; e.g., that a lot more of the comments for repeal were made using form templates (of the kind that activist sites provide for their subscribers).
In light of all of the analysis done by both Kao and Emprata, this would suggest an overall picture in which those who favor repeal were more likely to be single, trackable individuals who expressed their opinion through a form submission, while those who are against repeal were more likely to submit personally written comments via disposable non-trackable addresses; to submit duplicate comments; and to make comments from international addresses.
Your choice how to interpret this information. Welcome to the brave new world of battle-bots in the public-comments process for federal regulations.