Who Fact Checks the Fact Checkers?

Patrick Chang
Marketing in the Age of Digital
3 min readNov 15, 2020

--

The election is over, and President-Elect Biden is poised to bring the U.S. out of the quagmire of the past four years. And hopefully, this means that the office of the Presidency will no longer be used to espouse fake news the way Trump has done so often (and is still doing). However, this doesn’t mean that the era of fake news is over. There’s still a lot of work to be done to ensure that the information that is consumed on a daily basis has high fidelity.

And while it may still be a while off, some companies are taking that first step in mitigating the impact of misinformation and disinformation. Starting in May, Twitter started adding tags to Trump’s posts inciting violence or spreading fake news, and just this year Facebook started funding university research on fake news.

How To Determine What’s True?

At the risk of sounding like a shill for big tech, I honestly do believe that Facebook and Twitter are trying to develop technologies to combat fake news, even without financial incentive or government regulation demanding they do so. In fact, when I interviewed for a data science position at Facebook back in mid-2019, it was explicitly for a role around misinformation identification. The algorithm discussed wasn’t anything new and it’s not dramatically different from what how this Princeton paper classifies misinformation, or how Main Street One does does this with social listening.

You use NLP to parse reputable news sites to determine an index of news trustworthiness (with more weight attached to newswires like Reuters or AP), and compare that with what’s shared. This also takes factors like writing style, author analysis, and metadata into account. All in all, it’s pretty easy to build a model that can detect misinformation with the wealth of information that we have available. And given the legion of content moderators, that’s a lot of data that we can train the models on, so what’s the issue with implementation?

Well, That Depends on Our “Source of Truth”

The trouble is (as with most things analytics and model building): “garbage in, garbage out” — how do we determine the veracity of our data sources feeding the model? The model is only as good as the information we give it, and in creating weights that favor reputable news sources, we’re actually starting to bias the model. At first glance, this bias isn’t really an issue — journalistic sites with high reliability scores are scored high because they employ more journalistic rigor. They suffer reputational harm if they push misinformation, and are therefore incentivized to keep their version of the truth, well, close to the truth.

But the issue here is the overreliance on the existing structure. If news sites no longer have the level of journalistic integrity, the data feeding this model is bad, and therefore the output of detecting fake news is also bad. As our models all rely on the news sites to be that objective source of truth, we’re placing our blind trust in the fact that they’re getting it right.

There’s no easy answer to this part. We could keep adjusting the weight we assign to each news source in accordance to their objectivity and truth. But assigning these weights are subject to bias as well. We can crowdsource from a representative sample using Amazon mTurk or something similar, but that suffers from selection bias, and if the results of the election are anything to go by, fake news sites could be rated relatively highly. Ultimately, in order to combat the spread of misinformation or disinformation, we first need to have an objective source of the truth. And until we can verify that our truth is grounded in reality and not biased, this will remain an open-ended problem.

--

--

Patrick Chang
Marketing in the Age of Digital

Marketing Analytics Professional | NYU Integrated Marketing Student