Since its launch in December, the Fact-Check Insights dataset has been downloaded hundreds of times by researchers who are studying misinformation and developing technologies to boost fact-checking.
But what should you expect if you want to use the dataset for your work?
First, you will need to register. The Duke Reporters’ Lab, which maintains the dataset with support from the Google News Initiative, generally approves applications within a week. The dataset is intended for academics, researchers, journalists and/or fact-checkers.
Once you are approved, you will be able to download the dataset in either CSV or JSON format.
Those files include the metadata for more than 200,000 fact-checks that have been tagged with ClaimReview and/or MediaReview markup.
The two tagging systems — ClaimReview for text-based claims, MediaReview for images and videos — are used by fact-checking organizations across the globe. ClaimReview summarizes a fact-check, noting the person and claim being checked and a conclusion about its accuracy. MediaReview allows fact-checkers to share their assessment of whether a given image, video, meme or other piece of media has been manipulated.
The Reporters’ Lab collects ClaimReview and MediaReview data when it is submitted by fact-checkers. We filter the data to include only reputable fact-checking organizations that have qualified to be listed in our database, which we have been publishing and updating for a decade. We also work to reduce duplicate entries, and standardize the names of fact-checking organizations. However, for the most part, the data is presented in its original form as submitted by fact-checking organizations.
Here are the fields that you can expect to be included in the dataset, along with examples:
ClaimReview
CSV Key | Description | Example Value |
---|---|---|
id | Unique ID for each ClaimReview entry | 6c4f3a30-2ec1-4e2e-9b57-41ad876223e5 |
@context | Link to schema.org, the home of ClaimReview | https://schema.org |
@type | Type of schema being used | ClaimReview |
claimReviewed | The claim/statement that was assessed by the fact-checker | Marsha Blackburn “voted against the Reauthorization of the Violence Against Women Act, which attempts to protect women from domestic violence, stalking, and date rape.” |
datePublished | The date the fact-check article was published | 10/9/18 |
url | The URL of the fact-check article | https://www.politifact.com/truth-o-meter/statements/2018/oct/09/taylor-swift/taylor-swift-marsha-blackburn-voted-against-reauth/ |
author.@type | Type of author | Organization |
author.name | The name of the fact-checking organization that submitted the fact-check | PolitiFact |
author.url | The main URL of the fact-checking organization | http://www.politifact.com |
itemReviewed.@type | Type of item reviewed | Claim |
itemReviewed.author.name | The person or group that made the claim that was assessed by the fact-checker | Taylor Swift |
itemReviewed.author.@type | Type of speaker | Person |
itemReviewed.author.sameAs | URLs that help establish the identity of the person or group that made the claim, such as a Wikipedia page (rarely used) | https://www.taylorswift.com/ |
reviewRating.@type | Type of review | Rating |
reviewRating.ratingValue | An optional numerical value assigned to a fact-checker’s rating. Not standardized. (Note: 1.) The ClaimReview schema specifies the use of an integer for the ratingValue, worstRating and bestRating fields. 2.) For organziations that use ratings scales (such as PolitiFact), if the rating chosen falls on the scale, the numerical rating will appear in the ratingValue field. 3.) If the rating isn’t on the scale (ratings that use custom text, or special categories like Flip Flops), the ratingValue field will be empty, but worstRating and bestRating will still appear. 4.) For organizations that don’t use ratings that fall on a numerical scale, all three fields will be blank.) |
8 |
reviewRating.alternateName | The fact-checker’s conclusion about the accuracy of the claim in text form — either a rating, like “Half True,” or a short summary, like “No evidence” | Mostly True |
author.image | The logo of the fact-checking organization | https://d10r9aj6omusou.cloudfront.net/factstream-logo-image-61554e34-b525-4723-b7ae-d1860eaa2296.png |
itemReviewed.name | The location where the claim was made | in an Instagram post |
itemReviewed.datePublished | The date the claim was made | 10/7/18 |
itemReviewed.firstAppearance.url | The URL of the first known appearance of the claim | https://www.instagram.com/p/BopoXpYnCes/?hl=en |
itemReviewed.firstAppearance.type | Type of content being referenced | Creative Work |
itemReviewed.author.image | An image of the person or group that made the claim | https://static.politifact.com/CACHE/images/politifact/mugs/taylor_swift_mug/03dfe1b483ec8a57b6fe18297ce7f9fd.jpg |
reviewRating.ratingExplanation | One to two short sentences providing context and information that led to the fact-checker’s conclusion | Blackburn voted in favor of a Republican alternative that lacked discrimination protections based on sexual orientation and gender identity. But Blackburn did vote no on the final version that became law. |
itemReviewed.author.jobTitle | A title or description of the person or group that made the claim | Mega pop star |
reviewRating.bestRating | An optional numerical value representing what rating a fact-checker would assign to the most accurate content it assesses. See note on “reviewRating.ratingValue” field above. | 10 |
reviewRating.worstRating | An optional numerical value representing what rating a fact-checker would assign to the least accurate content it assesses. See note on “reviewRating.ratingValue” field above. | 0 |
reviewRating.image | An image representing the fact-checker’s rating, such as the Truth-O-Meter | https://static.politifact.com/politifact/rulings/meter-mostly-true.jpg |
itemReviewed.appearance.1.url to itemReviewed.appearance.15.url | A URL where the claim appeared. This field has been limited to the first 15 URLs submitted for the stability of the CSV. See the JSON download for complete “appearance” data. | https://www.instagram.com/p/BopoXpYnCes/?hl=en |
itemReviewed.appearance.1.@type to itemReviewed.appearance.15.@type | Type of content being referenced | CreativeWork |
MediaReview
CSV Key | Description | Example Value |
---|---|---|
id | Unique ID for each MediaReview entry | 2bfe531d-ff53-40f5-8114-a819db22ca8b |
@context | Link to schema.org, the home of MediaReview | https://schema.org |
@type | Type of schema being used | MediaReview |
datePublished | The date the fact-check article was published | 2020-07-02 |
mediaAuthenticityCategory | The fact-checker’s conclusion about whether the media was manipulated, ranging from “Original” to “Transformed” (More detail) | Transformed |
originalMediaContextDescription | A short sentence explaining the original context if media is used out of context | In this case, there was no original context. But this is a text field. |
originalMediaLink | Link to the original, non-manipulated version of the media (if available) | https://example.com/ |
url | The URL of the fact-check article that assesses a piece of media | https://www.politifact.com/factchecks/2020/jul/02/facebook-posts/no-taylor-swift-didnt-say-we-should-remove-statue-/ |
author.@type | Type of author | Organization |
author.name | The name of the fact-checking organization | PolitiFact |
author.url | The URL of the fact-checking organization | http://www.politifact.com |
itemReviewed.contentUrl | The URL of the post containing the media that was fact-checked | https://www.facebook.com/photo.php?fbid=10223714143346243&set=a.3020234149519&type=3&theater |
itemReviewed.startTime | Timestamp of video edit (in HH:MM:SS format) | 0:01:00 |
itemReviewed.endTime | Ending timestamp of video edit, if applicable (in HH:MM:SS format) | 0:02:00 |
itemReviewed.@type | Type of media being reviewed | ImageObject / VideoObject / AudioObject |
Please note that not every fact-check will contain data for every field.
For the JSON version of the table above, please see the “What you can expect when you download the data” section of the Guide on the Fact-Check Insights website. The Guide page also contains tips for working with the ClaimReview and MediaReview data.
If you continue to have questions about the Fact-Check Insights dataset, please reach out to hello@factcheckinsights.org.
Related: Researchers mine Fact-Check Insights data to explore many facets of misinfo