A recommendation engine is only as good as the data you use to train it. That means not just finding data specific to your purposes (in this case, preference data that can help you to understand consumer taste), but high-quality that can enable your algorithm to deliver reliable results to end-users.
How do you know if a third-party data set is high-quality? When using data that you didn’t collect on your own, it’s important that you understand as much as you can about how it was collected, why it was collected, and how much of it exists to ensure that it’s compatible with your business needs.
Volume: For a recommendation engine, you need a statistically significant sum of preference data on every product or service (in our case, every TV show) that you could conceivably recommend your users — that’s a lot of data.
Intent: High-intent psychographic data comes from people who don’t have external motivations for volunteering their opinion. Data that’s procured through focus groups, for example, isn’t high-intent because participants are usually paid to be there. An opinion given to you by a focus group participant isn’t necessarily going to be honest or informed because the participant’s intent is to get paid, not to voice their genuine opinion.
Context: Context helps to ensure that your data is relevant to the intent of your end-user. For example, if you’re looking for recommendations for delicious food on a tight budget, you’re not going to find a list of Michelin-rated restaurants in your area all that relevant to your needs. Whenever possible, preference data should tell you not just what consumers like, but why they like it.