recommender systems Web 2.0

Does the world want recommendations?

NewScientist reports on April 30th that Futureful, a Finnish start-up, is building a predictive iPad based search engine that will use a recommender system. By harvesting information from social feeds from Facebook, Twitter, etc…, its algorithm take the topics that are trending, it analyses the users’ interests and behaviour, and recommends new topics that might interest them.

Eric Schmidt is also quoted as having said “The ability to tell me things I didn’t know but am probably very interested in is the next great stage of search“.

I am possibly cynical about this topic and have extensively blogged (Who wants to be recommended?, May 2009) about the problem of appropriate recommendations and the ability to surprise of such systems.

The problems I see relate to how you are supposed to evaluate a system whose task is to generate surprising recommendations. Especially in academic research, the success of a recommendation engine is traditionally evaluated using a very simple metric: take a list of users choices on the given domain, hide a number of entries, check if the recommender system returns them upon analysing the remaining ones. Straightforward, although several other metrics have been proposed.

Now, how are you supposed to evaluate a system that doesn’t have a reference list? We can surely think of many metrics, some of them quantitative, some of them qualitative (or even social-based):

  • the probability a user follows the suggested link
  • the strength of the trust feeling towards the recommender
  • the fact that a user suggests the recommender system to other users …

However, a metric needs to be meaningful and qualitative metrics often lack this meaningfulness. If I’m a user and I want to be surprised, I will be probably following any random link. I often do that in what I call my serendipitous Wikipedia crawls. My favourite recommender system is, above all, Twitter: I only follow people that make me learn something interesting. Not one of the people that Twitter’s “Who to follow” system recommended me was relevant to me.

So I am a bit confused: what exactly a predictive search engine is really trying to achieve?

4 replies on “Does the world want recommendations?”

Evaluating recommender systems is one of the big unsolved mysteries of this so-called “science.” In fact, the real interesting aspect of building a recommender system is that there is no fully-correct answer for it to give: it is a creative endeavor as much as it is scientific.

But does not being able to empirically measure something mean we don’t want it? (e.g., do you like being loved?)

Absolutely spotted on, Neal 🙂

When I say “the world” I mean “the business world” or “the academic world”. Given the assumption that businesses want to make money and academics want to make their knowledge deeper, the question can be reduced to “if I can’t evaluate how good my RS is, who is going to pay for it”? or, “if I can’t say if my knowledge is getting really deeper using a RS, to what can I compare my results?”.

Of course, I’m happy if something weird and unrelated is recommended to me. But that happiness is not evaluable scientifically.
Don’t get me wrong: I love the creative aspects of a RS when I use it in my daily life. It’s just that I’m not convinced it’s viable as a business item, or interesting academically when becoming too generalist.

It all depends what part of “the system” you are trying to evaluate. If you want to evaluate the effectiveness of the algorithm it can be hard.

If you want to evaluate the effect “the system” has on its users (its influence) you have years of HCI and social psychology research to go by. i.e If you know the probability of a user following a link how about comparing that with actual link following behaviour?

I agree on that in theory.
What I find very difficult in practice is that metrics based on social psychology and HCI risk to be subjective, and to give different results according to the way the evaluator works.
But let’s give it the ok for a moment and assume these metrics work fine. I still suspect that especially for a business environment they do not produce a “strong enough” feel that the recommendation is “good”, with obvious consequences over the likelihood of RS-based profits. Surely this is less of a problem in research, but I would still feel not comfortable with metrics that are too qualitative and difficult to compare.

Leave a Reply

Your email address will not be published. Required fields are marked *