What Should I Look for in a Third-Party Data Set?

Screen Shot 2020-10-29 at 3.55.23 PM.png

A recommendation engine is only as good as the data you use to train it. That means not just finding data specific to your purposes (in this case, preference data that can help you to understand consumer taste), but high-quality that can enable your algorithm to deliver reliable results to end-users.

How do you know if a third-party data set is high-quality? When using data that you didn’t collect on your own, it’s important that you understand as much as you can about how it was collected, why it was collected, and how much of it exists to ensure that it’s compatible with your business needs.

Volume: For a recommendation engine, you need a statistically significant sum of preference data on every product or service (in our case, every TV show) that you could conceivably recommend your users — that’s a lot of data.

Intent: High-intent psychographic data comes from people who don’t have external motivations for volunteering their opinion. Data that’s procured through focus groups, for example, isn’t high-intent because participants are usually paid to be there. An opinion given to you by a focus group participant isn’t necessarily going to be honest or informed because the participant’s intent is to get paid, not to voice their genuine opinion.

Context: Context helps to ensure that your data is relevant to the intent of your end-user. For example, if you’re looking for recommendations for delicious food on a tight budget, you’re not going to find a list of Michelin-rated restaurants in your area all that relevant to your needs. Whenever possible, preference data should tell you not just what consumers like, but why they like it.


Why does Watchworthy use Ranker Insights data?

Screen Shot 2020-10-29 at 12.46.10 PM.png

When building the recommendation engine that powers our Watchworthy app, we found that the voting data from Ranker met all three of these criteria.

As the graphs we’ve featured here demonstrate, we’ve collected a tremendous volume of data about TV shows through the hundreds of TV lists on Ranker.com. That data spans the entire history of television and covers every genre from soap operas to sci-fi.

Ranker voting data is also high-intent because it’s coming from opinionated TV fans. People vote on our lists to influence the rankings of their favorite (and least favorite) TV shows — that means there is nothing motivating their votes besides the desire to make their opinions known.

Ranker lists also provide context for every vote cast on them. That means that a vote for The Good Place on our list of The Best Sit-Coms on the Air Right Now a vote for The Good Place on our list of The Greatest Sitcoms in Television History can be weighed different than a vote for the same show on our list of The TV Shows Most Loved by Hipsters would be. Ranker data doesn’t just tell you what shows people like, but why they like them.

Ranker data encompasses not just TV, but movies, celebrities, sports, technology, and virtually everything else. That means that Watchworthy represents just one kind of recommendation engine that can be built from the psychographic profiling data in Ranker Insights. If you’re also looking to build a recommendation app that avoids the “cold start” problem, Ranker Insights should be the first place you look.


Want to learn more about how we built a TV recommendation engine using Ranker Insights data? We tell the whole story in our Watchworthy white paper, which you can download here for free.