recommender systems Web 2.0

Who wants to be recommended?

There’s a lot of ongoing research on recommender systems, fostered by the Netflix Prize.

Recommender systems are basically a software implement of some sort that allows suggestions on a given domain to be offered to users. Usually they are specialised: Amazon’s recommender system recommends books,’s recommends songs, and the like.

The key to recommendation relies into different aspects. I may be suggested things similar to things I previously chose, or things my friends like. There’s a whole theory behind this so I won’t bore you. To know more, use this site as a starting point.

My problem with RS is that of this post’s title: who wants/needs recommendation? Is it always true that I like the same kind of things? Surely, I’m a good counter example to this. I love Star Trek. I have watched and would like to watch again all single episode. Nonetheless, I hate Star Wars. I find it boring. I don’t like sci-fi in general. No Terminator, no Robocop. I can’t even name other non-trek sci-fi. So my hypothetical RS should know that I don’t like every kind of sci-fi film, but only Star Trek. Maybe my friends share this view (but as far as I know, no one really does), so it could try checking my friends’ profiles first.
If you give a look at my music library (or simply explore my profile), you could define it at least eclectic. Someone would say it’s schizoid.
Moreover, sometimes I might want to do different things from those of my friends. Negative recommendation could be part of the solution, but the underlying algorithm would just be the same.

So what would represent a good recommendation to me? Well, usually what is important to me is surprise. I like many different things. The parameters that show that I like maybe are originality, quality, …, but maybe they are simply unknown. Some people suggested a “Surpise me button” to accomplish this task. But it’s not that easy. Even if I know what I don’t like.
Hence, the final questions: how can I represent tastes of a user? How can I represent his or her reactions (or feelings) towards something he or she expects or does not? How can I represent what I would like recommendation on, and what I wouldn’t?

Stay tuned on RecSys conferences to see if someone comes out with an answer; my guess is that we’ll be seeing lots and lots of new recommender systems in the next years, and each one will be confronted with these issues.

HCI Web 2.0

Wolfram Alpha and user experience

There are a lot of ongoing discussions about the power of Wolfram Alpha. I think that most of these conversations are flawed because of the argument that Wolfram Alpha does not find you enough information.

I believe that the mistake here lies in the common way the press have introduced the service. Wolfram himself has not been clear enough, and when he has, the press has of course misinterpreted him. Wolfram Alpha is not a search engine.

Many articles and blogs have been issued on the topic will Wolfram Alpha be the end of Google?
The problem here is that the two services are actually very different. Wolfram Alpha is a self-defined computational knowledge engine, not a search engine like Google. Google is able to return millions of results for a single search, whilst Alpha returns a single, often aggregated, result about some topic.

Alpha is basically an aggregator of information. It selects information from different data sources and presents them to the user in a nice and understandable way. Google is more like searching in the phone directory. So you’re supposed to ask different questions to the two services.

Of course, Alpha makes mistakes. A curious example I’ve found is the search for the keyword “Bologna”. Bologna is primarily the name of a town in Northern Italy (the one in which I attended university); it is also the name of a kind of ham, commonly known as “Mortadella”, especially outside Bologna itself. In Milan, for example, Mortadella is commonly called Bologna.

Well, search for Bologna on Google, and compare it with results on Alpha.

Google will return mostly pages about the town of Bologna, and its football team, where Alpha will tell you nutritional information of Mortadella.

Is this a ‘mistake’? I think that the only mistake is in the expectations users have about Alpha. it yields results from a structured knowledge base, hence its index is not as general as the one of Google. Nonetheless, I believe that there’s at least a problem in the user interface that should be corrected: the search box. It’s exactly the same as Google’s, same shape, same height, same width. But is there any alternative way of presenting an answering engine on the Internet?

What I think is that more HCI research is needed to let users understand what are the goals and the capabilities of a service like Alpha. If users think of it as a search engine, it will never have success.

Just to have a hint of what Alpha should be about, try this search.

Work in IT

The hunt for a Google job

The first time I got in touch with a Google recruiter was more or less a week after I’d decided to enrol for a PhD. Apparently this – very kind, I must say – recruiter was browsing uni pages and found my profile. At the time, apart from telling her that I was due to start a PhD in some months, I was very interested in Systems Administration. She did all her best to convince me to apply as a Developer. Weird, but probably there’s some rules here. Of course after a couple of interviews in which I told them that I was not interested in moving to Switzerland and that I wanted to do a PhD, they decided not to pursue with my profile.

Now, again, a recruiter has contacted me. A week after having started a new job. This time, I’ve moved onto being a developer. Guess what? She wants me to apply for a Systems Administration position.

Google, you have knowledge of everything on the network. What about trying to tune your timing and select me for something I’m actually skilled for? πŸ™‚

Web 2.0

Twitter and the future of RSS

I read some interesting thought on the Mashable blog about the relationship between RSS and microblogging. If you think about the two technologies, there are for sure some evident similarities, i.e. they both deliver a stream of short items with high semantical concentration(*).

In RSS you usually get also a bigger amount of text, but to stay simple, we don’t lose generality if we see RSS as just a list of links about some topic.

Microblogging is in fact more general: it allows personal communication and link sharing. The content of a message must necessarily be compressed in 140 characters. That’s why I think that every message can be seen as a set of keywords – of course leaving out common words, articles, prepositions, and the like.
What you can do with these keywords, of course for twits containing urls, is to use them as tags for the url. Hence, you can basically build – over Twitter (**) – a RSS feed for whatever topic you like; moreover you can build a folksonomy of tags for it. Not bad, what do you think?

The question here is what’s the future of RSS with the increasing diffusion of Twitter, Twitter-like sites, and services built over Twitter (yet again, give a look to @footytweets, or @bakertweet, and you will see the potential here).

Many newspapers already use Twitter as a means for broadcasting updates and news alerts (the two important examples here: @bbcnews and @cnn). Thousands of users are already using these twits as a replacement for their RSS aggregators. The success of Twitter as a news alert broadcaster relies on its higher versatility with respect to its RSS counterpart: you can use keywords, hash tags, comments, together with urls. You may object that all of these features are more or less already present in RSS. Nonetheless, their usage is not immediate as in Twitter, and there is no single point of aggregation as Twitter offers.

Naturally, there’s always a dark side πŸ™‚ Finding relevant content in Twitter is not an easy task. There’s plenty of services claiming to be able to recommend users to whom you could be interested (see Mr. Tweet for the most popular example and an interesting application of a recommender system). However, a killer application here is still to appear and no single recommender is able to get you the real number of interesting twits you would like to get (also partially due to serious limitation in the search APIs that Twitter makes available).

This is what I would call filtering good content; we also need to mention that how to filter out bad content from twits is an issue that hasn’t been solved. It’s very easy to manage bad users: spammers get usually identified and blocked quickly, due to the intrinsically tightly coupled interest that twitterers have on the content of what they read (in other words, as soon as they realize it’s a spammer, they report it to Twitter). But what about filtering out content that is simply not interesting? One user you follow may write something irrelevant to the reasons for which you usually read his or her twits, and you may like to read only those in which you are interested. This is still an open issue.

Addressing the bad points is not an easy task. Nonetheless, I must say that I already see microblogging as a good replacement for RSS. Many users are starting to use Twitter this way. And I’m realizing, as I write this post, that I’m slowly doing the same, removing RSS feeds I don’t read anymore because I follow their update on Twitter.

(*) this is a definition I coined for a research proposal I still like a lot πŸ™‚
(**) ok, I’m using the terms Twitter and microblog interchangeably, but if you think about the expression Google it you’ll realize that the winner takes it all – including the right of naming the appropriate service.

Web 2.0

Old media censorship on new media?

I’ve took part to that nice event called the “Twitter Developer Nest”, that is basically a meeting for people interested in developing applications over twitter.

There have been nice talks and presentation of old and new Twitter apps, including and amazingly funny presentation by @aszolty on his BakerTweet systems (which quite interestingly merges three things I’ve been looking at lately: Twitter, Arduino, and the idea of bringing pieces of technology to uncommon areas).

What I found very mind-stimulating was the talk by Ollie Parsley (@ollieparsley) about his FootyTweets service. Basically, this service sent out twits with live matches updates, using accounts related to football teams.

Having become hugely successful (>4K followers for the Manchester United), he received a “Cease and Desist” notice fromΒ Football DataCo (read here and here for some coverage), who are the “owner” of football fixtures updates.

Exception made for a couple of naif issues (e.g. Ollie used copyrighted club logos to represent the teams, which he promptly replaced with self-designed images), the C&D notice was focused on the service of live update itself. Which raises several interesting points.

I see Twitter (and many other people do) as a shout-from-your-window-and-see-who-listens service. That is, Ollie was basically telling everyone “look, Arsenal scored one minute ago“. No need to say he didn’t charge any money (but from the point of view of FootBall DataCo this does not matter as it’s lost profit anyway).

Legally speaking, it’s an interesting issue as some questions can be raised:
– what if I text a friend whilst I’m at the stadium telling him live that our team scored a goal?
– what if do the same using twitter?
– what if Ollie retweets me rather than running the service by his own? who’s infringing data ownership, me or him?
– are we sure that letting people know live fixture actually represents a threat to Football DataCo’s profits? what if it instead drives *more* people to be interested in getting live fixture, with better service levels, turning into more profits?

No answers by now, and Ollie had to stop the match update service.

What is evident to me, though, is that this is the quintessential legal case for the Web 2.0, which is by nature social/collaborative, real-time, and involving mash-ups. I honestly think that old media law about data ownership and copyright can be applied to Web 2.0 only by blocking all services. tout-court. In fact, if you focus on the example before, retweeting from other people – that is exactly the form of crowd sourcing that Ollie is thinking of – you will realize soon that you can identify who is committing infringement of what law. It’s the crowd, in a certain sense, but your acts alone are not against the law.
Being one of the fundamental principles of law the fact that the responsibility for a certain unlawful act personal, who should be sued?

I don’t have many hopes here, but I think that we need some form of Law 2.0, which does not mean – as web-opponents usually claim – that the Web doesn’t want rules.