Press "Enter" to skip to content

What exactly is the Fact-Check Insights dataset?

Since its launch in December, the Fact-Check Insights dataset has been downloaded hundreds of times by researchers who are studying misinformation and developing technologies to boost fact-checking.

But what should you expect if you want to use the dataset for your work?

First, you will need to register. The Duke Reporters’ Lab, which maintains the dataset with support from the Google News Initiative, generally approves applications within a week. The dataset  is intended for academics, researchers, journalists and/or fact-checkers.

Once you are approved, you will be able to download the dataset in either CSV or JSON format.

Those files include the metadata for more than 200,000 fact-checks that have been tagged with ClaimReview and/or MediaReview markup.

The two tagging systems — ClaimReview for text-based claims, MediaReview for images and videos — are used by fact-checking organizations across the globe. ClaimReview summarizes a fact-check, noting the person and claim being checked and a conclusion about its accuracy. MediaReview allows fact-checkers to share their assessment of whether a given image, video, meme or other piece of media has been manipulated.

The Reporters’ Lab collects ClaimReview and MediaReview data when it is submitted by fact-checkers. We filter the data to include only reputable fact-checking organizations that have qualified to be listed in our database, which we have been publishing and updating for a decade. We also work to reduce duplicate entries, and standardize the names of fact-checking organizations. However, for the most part, the data is presented in its original form as submitted by fact-checking organizations.

Here are the fields that you can expect to be included in the dataset, along with examples:

ClaimReview

CSV Key Description Example Value
id Unique ID for each ClaimReview entry 6c4f3a30-2ec1-4e2e-9b57-41ad876223e5
@context Link to schema.org, the home of ClaimReview https://schema.org
@type Type of schema being used ClaimReview
claimReviewed The claim/statement that was assessed by the fact-checker Marsha Blackburn “voted against the Reauthorization of the Violence Against Women Act, which attempts to protect women from domestic violence, stalking, and date rape.”
datePublished The date the fact-check article was published 10/9/18
url The URL of the fact-check article https://www.politifact.com/truth-o-meter/statements/2018/oct/09/taylor-swift/taylor-swift-marsha-blackburn-voted-against-reauth/
author.@type Type of author Organization
author.name The name of the fact-checking organization that submitted the fact-check PolitiFact
author.url The main URL of the fact-checking organization http://www.politifact.com
itemReviewed.@type Type of item reviewed Claim
itemReviewed.author.name The person or group that made the claim that was assessed by the fact-checker Taylor Swift
itemReviewed.author.@type Type of speaker Person
itemReviewed.author.sameAs URLs that help establish the identity of the person or group that made the claim, such as a Wikipedia page (rarely used) https://www.taylorswift.com/
reviewRating.@type Type of review Rating
reviewRating.ratingValue An optional numerical value assigned to a fact-checker’s rating. Not standardized. (Note:
1.) The ClaimReview schema specifies the use of an integer for the ratingValue, worstRating and bestRating fields.
2.) For organziations that use ratings scales (such as PolitiFact), if the rating chosen falls on the scale, the numerical rating will appear in the ratingValue field.
3.) If the rating isn’t on the scale (ratings that use custom text, or special categories like Flip Flops), the ratingValue field will be empty, but worstRating and bestRating will still appear.
4.) For organizations that don’t use ratings that fall on a numerical scale, all three fields will be blank.)
8
reviewRating.alternateName The fact-checker’s conclusion about the accuracy of the claim in text form — either a rating, like “Half True,” or a short summary, like “No evidence” Mostly True
author.image The logo of the fact-checking organization https://d10r9aj6omusou.cloudfront.net/factstream-logo-image-61554e34-b525-4723-b7ae-d1860eaa2296.png
itemReviewed.name The location where the claim was made in an Instagram post
itemReviewed.datePublished The date the claim was made 10/7/18
itemReviewed.firstAppearance.url The URL of the first known appearance of the claim https://www.instagram.com/p/BopoXpYnCes/?hl=en
itemReviewed.firstAppearance.type Type of content being referenced Creative Work
itemReviewed.author.image An image of the person or group that made the claim https://static.politifact.com/CACHE/images/politifact/mugs/taylor_swift_mug/03dfe1b483ec8a57b6fe18297ce7f9fd.jpg
reviewRating.ratingExplanation One to two short sentences providing context and information that led to the fact-checker’s conclusion Blackburn voted in favor of a Republican alternative that lacked discrimination protections based on sexual orientation and gender identity. But Blackburn did vote no on the final version that became law.
itemReviewed.author.jobTitle A title or description of the person or group that made the claim Mega pop star
reviewRating.bestRating An optional numerical value representing what rating a fact-checker would assign to the most accurate content it assesses. See note on “reviewRating.ratingValue” field above. 10
reviewRating.worstRating An optional numerical value representing what rating a fact-checker would assign to the least accurate content it assesses. See note on “reviewRating.ratingValue” field above. 0
reviewRating.image An image representing the fact-checker’s rating, such as the Truth-O-Meter https://static.politifact.com/politifact/rulings/meter-mostly-true.jpg
itemReviewed.appearance.1.url to itemReviewed.appearance.15.url A URL where the claim appeared. This field has been limited to the first 15 URLs submitted for the stability of the CSV. See the JSON download for complete “appearance” data. https://www.instagram.com/p/BopoXpYnCes/?hl=en
itemReviewed.appearance.1.@type to itemReviewed.appearance.15.@type Type of content being referenced CreativeWork

MediaReview

CSV Key Description Example Value
id Unique ID for each MediaReview entry 2bfe531d-ff53-40f5-8114-a819db22ca8b
@context Link to schema.org, the home of MediaReview https://schema.org
@type Type of schema being used MediaReview
datePublished The date the fact-check article was published 2020-07-02
mediaAuthenticityCategory The fact-checker’s conclusion about whether the media was manipulated, ranging from “Original” to “Transformed” (More detail) Transformed
originalMediaContextDescription A short sentence explaining the original context if media is used out of context In this case, there was no original context. But this is a text field.
originalMediaLink Link to the original, non-manipulated version of the media (if available) https://example.com/
url The URL of the fact-check article that assesses a piece of media https://www.politifact.com/factchecks/2020/jul/02/facebook-posts/no-taylor-swift-didnt-say-we-should-remove-statue-/
author.@type Type of author Organization
author.name The name of the fact-checking organization PolitiFact
author.url The URL of the fact-checking organization http://www.politifact.com
itemReviewed.contentUrl The URL of the post containing the media that was fact-checked https://www.facebook.com/photo.php?fbid=10223714143346243&set=a.3020234149519&type=3&theater
itemReviewed.startTime Timestamp of video edit (in HH:MM:SS format) 0:01:00
itemReviewed.endTime Ending timestamp of video edit, if applicable (in HH:MM:SS format) 0:02:00
itemReviewed.@type Type of media being reviewed ImageObject / VideoObject / AudioObject

Please note that not every fact-check will contain data for every field.

For the JSON version of the table above, please see the “What you can expect when you download the data” section of the Guide on the Fact-Check Insights website. The Guide page also contains tips for working with the ClaimReview and MediaReview data.

If you continue to have questions about the Fact-Check Insights dataset, please reach out to hello@factcheckinsights.org.

Related: Researchers mine Fact-Check Insights data to explore many facets of misinfo