Categories
geo Web 2.0

The past (and future?) of location

I must say – without making it too emotional – that I feel somewhat attached to geo-events at the BCS as my first contact with the London geo-crowd was there over a year ago, with a GeoMob including a talk by the same Gary Gale who gave a talk last night. That was, at least for him, one company and one whole  continent ago – for the rest of us the “agos” include new or matured geo-technologies: Foursquare, Gowalla, Latitude, Facebook and Twitter places, plus our very own London based Rummble, and minus some near-casualties (FireEagle).

Some highlights/thoughts from his talk:

The sad story of early and big players
– early players are not always winners: this can happen in a spectacular way (Dodgeball) or more quietly (Orkut has not technically been a commercial success, for example) – but also
– big players are not always winners: it’s all just a little bit of history repeating, isn’it? Remember the software revolution? The giant IBM didn’t understand it, and a small and agile company called Microsoft became the de-facto monopolist. OS/2 is still remembered as one of the epic fails in software. Remember the Internet revolution? The giant Microsoft had its very own epic fail called Microsoft Network. It took them ages to create a search engine, and in the meantime an agile and young company with a Big G became the search giant. Some years later, the aforementioned Orkut, started by Google as a side project, didn’t have the agility and the motivation to resist to Facebook. The same might happen about location services.

Power to the people
The problem with big players is that they take the quality of data bases for granted. Foursquare et al. found a way to motivate users to keep the POI database constantly updated by using a form of psychological reward. Something that Google hasn’t quite done.

Now monetize, please
Ok, we can motivate users by assigning mayorship and medals. Having a frequently refreshed database is a step ahead. But how do you make money out of it? “Let’s get in touch with the companies and ask for a share of the profit” can work for some brave early adopters. But it will not take long for companies to realize they can use the data – for free – to make business analysis without even contacting foursquare. “Become mayor and get a 10% discount”. What other data analysis should motivate them to pay for it? Knowing where a customer goes next? Where they’ve been before? Maybe to get higher profile in the searches, like in google searches? In the ocean of possibilities, the certainty is that there isn’t yet an idea that works well. “Even Facebook lacks the time to contact the big players to negotiate discounts“. And if you think about the small players it’s even more difficult (but if Monmouth offers me a free espresso I’ll work hard to become their Mayor!).
The way many companies are trying to sell it is still pretty much old economy: sell the check-ins database to a big marketing company, blablabla. Cfr. next point.

Dig out the meaningful data
Ok, we have motivated the users to keep our POIs fresh. But they want to be mayor, so they exploit APIs. Their favourite bar has already a Mayor? They create another instance of the same place. They create their own home. I’ve seen a “my bed”. Is there an algorithmic way to filter out the meaningless data? Surely not in the general case. Moreover, as Gary stressed, simply “selling your database starts eroding its value“. Because the buyer needs to find a use for that mountain of data. As for now, such use is not evident, because most of the data is not meaningful at all.

“If Augmented Reality is Layar, I’m disappointed”
Some time ago I noticed a strange absence of overlap among the geo-crowd and the AR-crowd. The latter presents ideas that have been discussed for years by the former as a “revolution”. One problem is that maybe we have augmented reality but not a realistic augmentation, mostly because of reduced processing power on mobile devices. Ideally you would like to walk down the broadway, see a SuperMario-like green mushroom that gives you an extra shot of espresso (to me it’s like getting an extra-life), catch it, and claim the coffee in the shop around the corner. Unfortunately, GPS is not accurate enough (Galileo might solve this problem soon) and walking down all the time pointing your phone camera to the road will only drain your battery (and probably get you killed before you manage to catch the mushroom). It’s not just an issue of processing power and battery life, though. Even with that, there’s a serious user interaction issue. AR glasses might, partially, solve that, but I can’t really believe that augmenting reality is *just* that and not something that empowers a user’s imagination. Geo-AR is on the boundary between novelty (“oh look, it correctly puts a label on St Paul’s cathedral!“) and utility. And currently on the wrong side of it.

The director’s cut will (not) include recommendations
I’m sure we’ll make it to the director’s cut” – Alex Housley complained in the typical flamboyant way of the Rummble crowd about being left out of the presentation. “We believe trust networks are the future“. Yes and no. I agree with Alex in the sense that how to provide appropriate recommendations is an interesting research problem (but also here)  and the key to monetization of any service. It’s technically not the future, though: Amazon has been using recommendations for years, and I’ve done purchases myself prompted by their recommendations. Trust networks have been extensively used in services like Netflix. What Rummble is trying to do is a more direct way of exploiting trust networks to enrich recommendations, bringing them to the heart of the application. I’m sure that recommendations will play a role in monetizing the geo-thing and that even trust networks may, too. What I’m not sure about is if recommendations will be as they’re now. Without a revolution in the way users perceive local recommendation – that is, a user interaction revolution – they’re not gonna make it. Users need a seamless way of specifying the trust network, and a similarly seamless way of receiving the recommendation.

Categories
security Web 2.0

Luttazzi, la mala fede, e la ragione

[Sorry, this post is in Italian]

AGGIORNAMENTO 2 (14/6/2010):

Una pagina chiamata “Caccia al tesoro” compare su web archive gia’ a gennaio 2006: http://web.archive.org/web/20060112195056/http://www.danieleluttazzi.it/?q=node/144. Quindi, esisteva a gennaio 2006. Indicizzata circa 2 mesi dopo la sua creazione.

Si noti un dettaglio: node=144 invece di node=285. Con un formato di URL fondamentalmente diverso. Ovvero, c’e’ stato un cambio di CMS.

Questo chiaramente non toglie nessuno dei discorsi sul plagio, la copia, eccetera, ma quanto meno svuota l’accusa di cospirazione, che per quanto mi riguarda era fastidiosa (e non utile ai fini “morali” della discussione, che e’ quella di stabilire se e quanto sia lecito “copiare”/”citare”, con o senza riferimento). Non c’e’ stata, almeno per questo post, alcuna retrodatazione: esisteva gia’ nel 2005.

AGGIORNAMENTO (14/6/2010):

– Per correttezza, il gestore del blog ntvox, quello che per primo ha parlato di questa vicenda, mi ha chiesto di precisare che la questione di web.archive.org non e’ l’argomento chiave del suo blog, che invece e’ piu’ interessato alla discussione generale della liceita’ del copiare battute, e alla mole di battute apparentemente copiate da Luttazzi. Sebbene l’argomento venga citato nel blog, e’ vero che non ne e’ la questione fondamentale.

– Giusto per ripetere fino alla noia: mi sto formando una posizione sull’intera questione, e tale posizione ovviamente e’ personale. Questo blog e’ pero’ un blog tecnico, e questo post si riferisce solo agli aspetti tecnici di una prova usata in modo, a mio parere, tecnicamente errato. Non vuole essere un richiamo ad altre prove, o presunte tali. Ci sarebbe da discutere su cosa costituisca indizio e cosa prova inconfutabile; cosi’ come su quali siano i requisiti tecnici di quella che puo’ essere ammessa come “prova”. In questo post mi concentro sul perche’ questa specifica questione non possa essere ammessa come prova, per mancanza di requisiti tecnici. Full stop.

(fine)

Non lo nascondero’, fino a ieri mattina “ero” un fan di Daniele Luttazzi.
Dopo aver letto le notizie sull’eventuale “plagio” sono diventato un ex fan deluso.

Eppure qualcosa mi ha spinto a verificare le informazioni riportate, in particolare riguardo quella che viene ritenuta la prova “schiacciante” della mala fede del comico romagnolo.
Credo che ci siano delle ragioni prettamente tecniche che, invece, difendono tale buona fede o quanto meno dimostrano che le prove portate a suo carico sono, nel migliore dei casi, inconclusive.

Premetto: di professione faccio l’informatico, mi occupo di internet e networking, ho una certa esperienza personale di gestione di siti internet.

L’accusa: Luttazzi avrebbe copiato delle battute da famosi autori satirici e, onde evitare di essere smascherato quale plagiatore, avrebbe scritto sul suo blog due post in cui invitava a una “caccia al tesoro di citazioni”, retrodatando questi due post in modo tale da non destare “sospetti”.

Reperti dell’accusa: i due post in questione sono recuperabili dal blog di Luttazzi e sono:
http://www.danieleluttazzi.it/node/285 datato 9  giugno 2005
http://www.danieleluttazzi.it/node/324 datato 10 gennaio 2006

Prove dell’accusa: il sito internet http://web.archive.com. Tale sito permette di recuperare tutte le versioni precedenti di una pagina internet. Cercando su web.archive.com le due pagine in questione, vengono riportate le presunte “data di creazione”:
– per il post 285, tale data sarebbe il 9 ottobre 2007 (oltre 2 anni dopo la data riportata da Luttazzi)
– per il post 324, tale data sarebbe il 13 dicembre 2007 (poco meno di 2 anni dopo la data riportata da luttazzi)

Da un punto di vista tecnico-informatico, in realta’, quelle due date sono fuorvianti.
Quello che sfugge all’accusa e’ un piccolo dettaglio tecnico: la data che web.archive.org riporta NON e’ la data di creazione della pagina. Si tratta invece della data in cui tale pagina e’ stata raggiunta per la prima volta dai “robot” di web.archive.org. Se oggi viene creata una pagina internet, questa pagina ci mettera’ un certo tempo, piu’ o meno lungo, ad essere “trovata” da web.archive.org. Questo tempo puo’ richiedere, effettivamente, anni.

Ci si potrebbe chiedere, dunque, se due anni siano un tempo ragionevole per l’indicizzazione di un sito popolare come quello di Luttazzi. Chiaramente non e’ possibile, a rigor di logica, avere una risposta certa. Statisticamente parlando, pero’, abbiamo degli indizi piuttosto seri che i post non siano stati retrodatati da Luttazzi. Basta prendere alcune pagine a caso dal blog, e verificarne data riportata e data su web archive:

http://www.danieleluttazzi.it/node/277, data blog: 3 aprile 2007, MAI archiviata su web archive (forse questo dimostrerebbe che la pagina non esiste affatto?)
http://www.danieleluttazzi.it/node/286, data blog: 10 gennaio 2006, prima data web archive: 9 ottobre 2007
http://www.danieleluttazzi.it/node/289, data blog: 1 novembre 2006, prima data web archive: 9 ottobre 2007
http://www.danieleluttazzi.it/node/291, data blog: 14 marzo 2007, prima data web archive: 9 ottobre 2007 (per questa pagina viene riportata anche una modifica risalente al 2 agosto 2008, prova che dal 2007 in poi il sito di Luttazzi e’ stato costantemente seguito da web archive)

Si noti che molte di queste date risalgono a ottobre 2007. Anzi, allo stessa data di ottobre: il 9. La stessa data del presunto post incriminato. Motivo? L’intero sito e’ stato indicizzato a partire da ottobre 2007. Prima non era presente su web.archive.org.
A maggior riprova di questo, basti guardare http://web.archive.org/web/*/danieleluttazzi.it/* Questa pagina contiene l’elenco di TUTTE le pagine del sito danieleluttazzi.it presenti su web.archive.org. E’ facile verificare come fino al 9 ottobre 2007 il sito NON fosse indicizzato. Tant’e’ che in quella data sono state aggiunge letteralmente centinaia di pagine a web.archive.org

Lo stesso vale per altri blog.

Prendete, ad esempio, un altro comico molto, Beppe Grillo:
http://www.beppegrillo.it/2005/01/il_papa_e_infal.html data blog: 31 gennaio 2005, prima data web archive:
7 febbraio 2006 (oltre un anno dopo)

o quello del “cacciatore di bufale” Paolo Attivissimo:
http://attivissimo.blogspot.com/2005/12/come-sta-valentin-bene-grazie-e-ha.html data blog: 31 dicembre 2005, prima data web archive: 16 gennaio 2006

Succede anche al noto quotidiano online repubblica.it, seppur con meno attesa:
http://www.repubblica.it/ambiente/2010/04/27/news/marea-_nera-3646349/index.html?ref=search pubblicato il 27 aprile 2010 e non ancora su web archive; la pagina, tra l’altro, riporta un attesa di circa 6 mesi per entrare negli archivi (nel 2010, potrebbe essere stata piu’ alta nel 2007).

Se questo non prova che i due post in questione siano stati scritti davvero nel 2005 e nel 2006, diciamo che quanto meno e’ un indizio piuttosto forte che le date non siano state modificate manualmente. E comunque dimostra chiaramente che web.archive.org non puo’ essere usato, come e’ stato fatto, come prova per accusare Luttazzi di essersi difeso in mala fede, in quanto l’indicizzazione comincia troppo tardi.

Le valutazioni sul fatto se sia o meno lecito usare battute di altri non spettano a me da un punto di vista tecnico, ma al pubblico di Daniele. Di cui, ammirando prima di tutto lo stile di performance, torno a essere “fan”, dato che questo piccolo giro tecnico di verifica ha ristabilito la mia fiducia nella buona fede della sua difesa.

Mi piacerebbe che blogger, giornalisti, e altri accusatori verificassero il funzionamento di uno strumento tecnico di cui, evidentemente, hanno capito poco, prima di sbandierarlo come prova di mala fede.

Categories
geo gov Web 2.0

Free data: utility, risks, opportunities

Some random thoughts after The possibilities of real-time data event at the City Hall.

Free your location: you’re already being photographed
I was not surprised to hear the typical objection (or rant, if you don’t mind) of institutions’ representative when requested to release data: “We must comply with the Data Protection Act!“. Although this is technically true, I’d like to remind these bureaucrats that in the UK being portraited by a photographer in a public place is legal. In other words, if I’m in Piccadilly Circus and someone wants to take a portrait of me, and possibly use it for profit, he is legally allowed to do so without my authorization.
Hence, if we’re talking about releasing Oyster data, I can’t really see bigger problems than those related to photographs: where Oyster data makes it public where you are and, possibly, when, a photograph might give insight to where you are and what you are doing. I think that where+what is intrinsically more dangerous (and misleading, in most cases) than where+when, so what’s the fuss about?

Free our data: you will benefit from it!
Bryan Sivak, Chief Technology Officer of Washington DC (yes, they have a CTO!), has clearly shown it with an impressive talk: freeing public data improves service level and saves public money. This is a powerful concept: if an institution releases data, developers and business will start creating enterprises and applications over it. But more importantly, the institution itself will benefit from better accessibility, data standards, and fresh policies. That’s why the OCTO has released data and facilitated competition by offering money prizes to developers: the government gets expertise and new ways of looking at data in return for technological free speech. It’s something the UK (local) government should seriously consider.

Free your comments: the case for partnerships between companies and users
Jonathan Raper, our Twitter’s @MadProf, is sure that partnerships between companies and users will become more and more popular. Companies, in his view, will let the cloud generate and manage a flow of information about their services and possibly integrate it in their reputation management strategy.
I wouldn’t be too optimistic, though. Albeit it’s true that many longsighted companies have started engaging with the cloud and welcome autonomous, independently run, twitter service updates, most of them will try to dismiss any reference to bad service. There are also issues with data covered by licenses (see the case of FootyTweets).
I don’t know why I keep thinking about trains as an example, but would you really think that, say, Thameslink would welcome the cloud twitting about constant delays on their Luton services? Not to mention the fact that NationalRail forced a developer to stop offering a free iPhone application with train schedules – to start selling their own, non free (yes, charging £4.99 for data you can get from their own mobile web-site for free, with the same ease of use, is indeed a stupid commercial strategy).

Ain’t it beautiful, that thing?
We’ve seen many fascinating visualization of free data, both real-time and not. Some of these require a lot of work to develop. But are they useful? What I wonder is not just if they carry any commercial utility, but if they can actually be useful to people, by improving their life experience. I have no doubt, for example, that itoworld‘s visualization of transport data, and especially those about Congestion Charging, are a great tool to let people understand policies and authorities make better planning. But I’m not sure that MIT SenseLab’s graphs of phone calls during the World Cup Final, despite being beautiful to see, funny to think about, and technically accurate, may bring any improvement to user experience. (Well, this may be the general difference between commercial and academic initiative – but I believe this applies more generally, in the area of data visualization).

Unorthodox uses of locative technologies
MIT Senselab‘s Carlo Ratti used gsm cell association data to approximate people density in streets. This is an interesting use of technology. Nonetheless, unorthodox uses of technologies, especially locative technologies, must be taken carefully. Think about using the same technique to calculate road traffic density: you would have to consider single and multiple occupancy vehicles, where this can have different meanings on city roads and motorways. Using technology in unusual ways is fascinating and potentially useful, but the association of the appropriate technique to the right problem must be carefully gauged.

Risks of not-so-deep research
This is generally true in research, but I would say it’s getting more evident in location-based services research and commercial activities: targeting marginally interesting areas of knowledge and enterprise. Ratti’s words: “One PhD student is currently looking at the correlations between Britons and parties in Barcelona… no results yet“. Of course, this was told as a half-joke. But in many contexts, it’s still a half-truth.

Categories
geo geomob mobile Web 2.0

A bunch of nerds with maps

…I think I can define GeoMob this way and I fit this definition perfectly 🙂

Nice London Geo/Mobile Developers Meetup Group meeting yesterday at City University. High level of the talks, providing vision, reporting experiences, and showing technologies and nice uses of them. Here’s a short summary.

Andrew Eland – Mobile Team Lead for Google UK

A very Google-like talk, showing up tech pieces with their vision. Of course, disappointing if you were expecting more in-depth analysis of market, novel ideas, or anything more than current publicly known work. But we’re used to that, and it was not a bad talk at all 🙂
Best quote: “Tokyo is a vertical city“. That’s absolutely true, and this fact has a direct impact on geo-apps: being shops, clubs, bars, developed vertically at different levels of the buildings (this is a pic I took of the Keio Sky Garden, for example, and there are hundreds of beer gardens up on the roofs of several skyscrapers!) there’s a real need for accurate altitude information and 3d-mapping, or at least altitude-enabled maps. The interesting question for me here is how we can show multi-floor information on the 2d-maps currently in use.

Julianne Pearce, Blast Theory
An artists’ collective perspective on geo-development. Absolutely intriguing, as not the average techietalk you would expect from a GeoMob. I found this personally interesting, as I played with the Can you see me know? game and even created a modified version of it at the UbiComp Spring School at Mixed Reality Lab, University of Nottingham in April 2009, during a workshop dealing with Locative Game Authoring.

PublicEarth
They introduced their concept of a web 2.0 site for creating a personal atlas. Basically it’s about putting photographs and commercial activities of interest on a personal map. They seem to be developing APIs and the possibility of creating widgets, and directly deal with small companies (hotels, b&b, restaurants, bars) to put them in their database. The idea here is that users will be allowed to tell the (possibly intelligent) system what categories of data they’re mostly interested in, leading to some kind of customised Michelin guide.
On monetization, they have a three-fold strategy:
– contextual advertisement, empowered by the fact that users are genuinely interested in what they put in their atlas
– share of profit on direct bookings
– [long-term] user base providing more content, improving quantity and quality of contextual data in a positive feedback loop, possibly making it interesting to other companies

Laurence Penney, SnapMap
My favourite talk of the night. Laurence has been longing for a way of placing precisely photographs on a map for more than 10 years.
I was astonished of seeing him doing many of the things I would have liked to see in web sites like Flickr and that I’ve been discussing for ages with my friends and colleagues! Using gps data, a compass, waypoints, directions, focal length, and all the other data associated with a photograph, Laurence is developing a web site to allow users navigate those pictures, even creating 3d views of them like the guys at University of Washington with Rome wasn’t built in a day. Funnily, he started all of these before gps/compass-enabled devices were available, writing down all of his data on a notebook, and he even had problems with the police inquiring why he was taking picture at the Parliament (unfortunately, I have to say he’s not alone -_-).

Mikel Maron – Haiti Earthquake OpenStreetMap Response
Mikel explained what OpenStreetMap did to help in Haiti. Disaster response relies heavily on updated maps of building, streets, and resources, and OSM quickly managed to get that done. A great thanks to him and to all of OSM guys to show the world that mapping can be helpful to people even leaving out profit considerations.

Categories
Web 2.0

HootMonitor: a Twitter app with a strategy

Ollie Parsley is a developer from Dorset I’ve been following with much interest since his first appearance at the London Twitter Devnest last May (you might remember I blogged about it) as his work is often pointing mind-boggling problems in a developer’s everyday life (read about his Cease&Desist experience, for example).

HootMonitor is his latest Twitter application, even if I would say it’s reductive to call it a “Twitter application”. As it’s been introduced during last Devnest, HootMonitor is simply speaking a website monitoring tool using Twitter as a communication device. I.e.:

  • you get an account on HootMonitor linked to your Twitter account
  • add a web site you want to be monitored
  • HootMonitor will periodically monitor the web site for you
  • the service will send you a Twitter direct message/e-mail/sms if the web site goes down
  • you will also get aggregate status reports (uptime and downtime, average response time, etc…).

As there has been much interest lately over the use of Twitter as a corporate tool, and never ending discussion over the possibility of a business model that allows Twitter to monetize its success, it looks like Ollie has touched again some issues and addressed the whole process of bringing this service to user in a way that resembles the classical case study from literature. I believe that HootMonitor is going to be an interesting and possibly successful experiment for the following reasons:

  • Mashup use of Web 2.0 technologies: HootMonitor is not the first try of creating an application out of Twitter and there have been many mashups that received extensive press coverage. Nonetheless, HootMonitor is the very first application, as I’m going to explain, to deliver a service over Twitter that carries together: intrinsic usefulness, a business model, and a good “marketing” strategy.
  • Useful service: HootMonitor adds value to user experience solving a real problem without disrupting the users’ life. There is plenty of monitoring tools out there, but not many of them generate reports in a way that integrates seamlessly into their lives and jobs.
  • Freemium model: this is the most interesting aspect of HootMonitor. It can be used for free, but it has premium functionalities that you can get by paying a (reasonably priced) subscription. As far as I’m aware of, this is the first application with such a business model to have emerged over Twitter API. There is plenty of possibilities of trying the service for free. You can experience all the usefulness of it without paying a single penny. The functionalities you pay for, though, are worth the price (for example: personalised statistics or mobile text messages). Many other successful Twitter applications do not have a business model at all and it’s hard to imagine how they will ever lead to generate profit (unless they’re used as an advertisement tool for other products/services).
  • Marketing strategy: Ollie has been developing HootMonitor for some months, letting the users of his other apps and his Twitter followers know about this idea. The steps here were developing some kind of “corporate” HootMonitor blog, a Twitter account to engage with potential users, a small company under whose name work (HootWare). Moreover, HootMonitor was launched exactly the night after its presentation at the Devnest. I believe this was a smart marketing move that made the service getting the highest level of advertisement possible.

Naturally, I can’t forecast whether or not HootMonitor will be a successful venture but I’m optimistic about it and of course I wish Ollie to get there. And as I’m finding it very useful for my websites, and I’m aware of many other people trying it, given its strategy and model it’s likely we’ll be hearing more about it in the short (and maybe longer) time.

Categories
recommender systems Web 2.0

The impossibility of appropriate recommendations

I’ve recently finished reading Hofstadter’s “Goedel, Escher, Bach”, after three years and a number of failed attempts with restarts. Of the main topics touched, I’ve found interesting its approach to the problem of natural language automatic understanding and generation. And I feel that this problem is intrinsically related to that of generating recommendations for users (ok, this is not a great discovery, I must admit).

The way we can understand the problem can be simply put as follows. Imagine we have a language generator we can ask to create sentences. We could:

  • ask it to create correct sentences (i.e. grammatically correct sentences – this is somewhat possible)
  • ask it to create meaningful sentences
  • ask it to create funny sentences

The three points before carry different attributes, whose meaning attribution can be subject to discussion. As you can imagine, funny implies meaningful and correct, and meaningful implies correct. Which means that the generation of such sentences is increasingly hard and complicated. Moreover, as everyone can, within certain boundaries, generate a correct sentences, there are surely more shadows on what are the characteristics of a meaningful sentence (e.g. what is meaningful to me could not be meaningful to you), and a funny sentence needs its real, underlying, meaning to mean something different than its apparent meaning. You can also notice that the attribution of such attributes to a correct sentence is increasingly personal, too. The attribution of meanings is an intrinsically human activity, and this is well known to programming languages developers and logicians who deal with concepts such as syntax and semantics.

How all of this relates to the field of recommender systems should be obvious by now. A RS is a tool that, more or less, tries to understand what is meaningful to a user to provide him or her with suggestions. What a general purpose RS should do is to understand the meaning of objects and find similar objects. The thing is, the meaning of objects, especially when expressed by natural language, is not easy to establish, and in general cannot be established at all.

I recently reviewed a paper for a friend doing research in RS that reported an example similar to this: “I’m at home, and would like to get a restaurant in Angel Islington for tonight”. Contextual information (and subsequent activity and intent inference) are the interesting part of this request for a recommendation: it does not matter where I’m now, but where I would like to go. This is a very simple issue to deal with, but how about all those situations in which context is implicit?

You will object that a general purpose RS cannot exist and wouldn’t be that useful. Truth is, however, that even a limited domain RS as one for books or DVDs may encounter similar problems. I’ve been discussing the possibility of a “surprise me” button, proposed by Daniele Quercia. The idea is that sometimes as a user I would like to be suggested something new rather than something similar to what I’ve done in the past or to what my friends like. But this concept opens a very deep issue about to what extent should a surprise be made. In other words: it’s not possible to understand what kind of recommendation the user would like to receive. What a RS may do is to detect users’ habits or activities, and provide always a similarity suggestion.

So here’s my view of the limitation of current RS: they cannot – as of today – provide a recommendation to a user that likes to try something new. RS are for habituées.

A stupid example: I’ve read four books in a row by the English author Jonathan Coe. After that, Amazon kept on recommending me other books by Coe, whilst of course I wanted a break from them.

Any objections? E.g.:
meaning in current RS is not expressed by natural language: true, but nonetheless this is a limitation of the systems themselves. This actually produces the result of not being able to give suggestions other than those based on the values. For example, “rate your liking of the book from 1 to 5” will never be able to express if the user actually would like to read it again, if it would recommend to others, or if it. Structured representation does not capture real meaning, and restricts the gamut of representable information about the user.
no RS is general purpose: I think even limited domain RS suffer from the same problem, as no RS can infer a user’s feelings.

I’m not proposing silver bullets here, and of course not all research/applications in RS is to be trashed. Some possible research and development directions may be:
– use direct social suggestions: to whom you would suggest it? (similar to direct invitation in Facebook – where nonetheless all the limitations of this approach are evident)
– deal with changes in user tastes and try to predict them
– use more contextual information
– try inference from natural language, for example inferring user tastes from his or her long reviews
– better user profiling based on psychological notions and time-variance: TweetPsych has for example tried profiling a user based on tweets, that are short and scattered across time.

Categories
geo geomob mobile Web 2.0

At the #GeoMob

Hey folks, long time I haven’t blogged – been very busy at work and home! Let me resume my techie stuff by summarising some of my thoughts after the #GeoMob night at the British Computer Society, last 30 July.
The #GeoMob is the London Geo/Mobile Developers Meetup Group, and it organises meeting of developers interested in the geo/social/mobile field, usually with participation from industry leaders (Yahoo!/Google), businesses, startups.

This are my thoughts about the night, grouped by talk:

Wes Biggs, CTO Adfonic

  • AdFonic is a mobile advertisement provider that launched 1/7/09 (their home page doesn’t work, though. You need to go to http://adfonic.com/home)
  • what about user interaction and privacy? if I don’t get it completely wrong (reading here it seems I haven’t), the actual user experience is to have some kind of advertisement bar on your mobile application. If it’s just this, it’s simply the porting of an old desktop idea to the mobile environment. The problem is that it was not a hugely successful idea. Here the user is rewarded even less compared to the desktop bars (I guess by getting the app for free?). I’m not sure this can be a really successful venture unless the ads are smartly disguised as “useful information” – but, hey, I’m here to be refuted 😛
  • getting contextual information is difficult, even if you know the location of the user you don’t know what he/she’s doing. Good motto from the talk “advertisers are not interested in where you are, but in where you’re at“. But how to get and use these contextual information was not really clear from the talk. From their website’s FAQ, I read:
    • You can target by country or region.
    • You can target by mobile operator.
    • You can define the days of the week and the time of day you wish your ad to be displayed in the local market.
    • You can choose to target by demographics by selecting gender and age range profiles.
    • You can choose devices by platform, brand, features and individual models.
    • You can also choose to assign descriptive words for your campaign using tags. We compare these tags to sites and apps in the Adfonic network where your ad could be displayed, improving your ad’s probability of being shown on a contextually relevant site.

    This raises a couple of privacy concerns, as well as technical ones 😉

  • I would say this talk raised more questions than those answered – nonetheless it was, at least for me, good for brainstorming about mobile targeting
  • some of the issues with this service – which I’m really interested in watching to know where it heads to – are interestingly the same of a paper about leisure mobile recommender systems that I reviewed for MobBlog

Henry Erskine Crum, @henryec, Co-founder of Spoonfed

  • Spoonfed is a London based web startup (Sep. 2008) that focuses on location-based event listings
  • 12 people work there – which makes it interestingly big to be a startup
  • very similar to an old idea of mine (geo-events but in a more social networking fashion) – which prompts me to realize I need to act fast, when I have such ideas 🙂
  • I would have liked the talk to dig deeper into details about user base, mobile apps and HCI issues, but it was not a bad talk and it provided a very operational and yet open minded view of how the service works and evolves
  • oh, and Henry was congratulated as the only guy in a suit (:P lolcredits to Christopher Osborne)

Gary Gale, @vicchi, Director of Engineering at Yahoo! Geo Technologies, with a talk about Yahoo! Placemaker

  • get here the slides for this talk
  • Yahoo! Placemaker is a useful service to extract location data from virtually any document – also known as Geoparsing. As the website says: Provided with free-form text, the service identifies places mentioned in text, disambiguates those places, and returns unique identifiers for each, as well as information about how many times the place was found in the text, and where in the text it was found.
  • I see it very interesting especially as it is usable with Tweets and blog posts, and it can help creating very interesting mashups
  • only issue: its granularity is up to the neighbourhood – which is perfectly good for some applications, but I’m not sure it is also for real-time-location-intensive mobile apps

Steve Coast, @SteveC, founder of OpenStreetMap and CloudMade, with a talk about Ubiquitous GeoContext

  • OpenStreetMap can be somewhat considered the community response to Google Maps: free maps, community-created and maintained, freely usable – CloudMade being a company focusing on using map data to let developers go geo
  • the motto from this talk is “map, please get me to the next penguin in this zoo” – that is, extreme geolocation and contextual information
  • success of a geo app – but according to me also applicable to many Internet startups – summarized in 3 points:
    • low cost to start
    • no licensing problems
    • openness / community driven effort
  • it was an absolute delight to listen to this talk, as it was fun but also rich of content – the highly visual presentation was extremely cool, I hope Steve is going to put it online!

Oh, and many thanks to Christopher Osborne, @osbornec, for organising an amazing night!

Categories
HCI my projects Web 2.0 Work in IT

Aggregated values on a Google Map

UPDATE 27/08/09: The functionalities of my version of MarkerClusterer have been included in the official Google code project, you can find it gmaps-utility-library-dev. The most interesting part was the so called MarkerClusterer.

Imagine you need to show thousands of markers on a map. There may be many reasons for doing so, for example temperature data, unemployment distributions, and the like. You want to have a precise view, hence the need for a marker in every town or borough. What Xiaoxi and other developed, is a marker able to group all the markers in a certain area. This is a MarkerClusterer. Your map gets split into clusters (of which you can specify the size – but hopefully more fine grained ways of defining areas will be made available) and you show for every cluster a single marker, which is labelled with the total count of markers in that cluster.

I thought that this opened a way to get something more precise and able to make reasoning over map data. Once you have a ClusterMarker, wouldn’t it be wonderful if you had the possibility of displaying some other data on it, rather than the simple count? For example, in the temperatures distribution case, I would be interested in seeing the average temperature of the cluster.

That’s why I developed this fork of the original class (but I’ve applied to get it into the main project – finger crossed!) that allows you to do what follows:

  • create a set of values to tag the locations (so that you technically attach a value to each marker)
  • define a function that is able to return an aggregate value upon the values you passed, automatically for each cluster

That’s all. The result is very simple, but I believe it is a good way to start thinking about how the visualization of distributed data may affect the usability of a map and the understanding of information it carries. Here’s a snapshot of the two versions, the old on the left (bearing just the count) and the new on the right (with average data). Data here refer to NHS Hospital Death Rates, as published on here. If you want to see the full map relating to this example, click here.

Categories
recommender systems Web 2.0

Who wants to be recommended?

There’s a lot of ongoing research on recommender systems, fostered by the Netflix Prize.

Recommender systems are basically a software implement of some sort that allows suggestions on a given domain to be offered to users. Usually they are specialised: Amazon’s recommender system recommends books, last.fm’s recommends songs, and the like.

The key to recommendation relies into different aspects. I may be suggested things similar to things I previously chose, or things my friends like. There’s a whole theory behind this so I won’t bore you. To know more, use this site as a starting point.

My problem with RS is that of this post’s title: who wants/needs recommendation? Is it always true that I like the same kind of things? Surely, I’m a good counter example to this. I love Star Trek. I have watched and would like to watch again all single episode. Nonetheless, I hate Star Wars. I find it boring. I don’t like sci-fi in general. No Terminator, no Robocop. I can’t even name other non-trek sci-fi. So my hypothetical RS should know that I don’t like every kind of sci-fi film, but only Star Trek. Maybe my friends share this view (but as far as I know, no one really does), so it could try checking my friends’ profiles first.
If you give a look at my music library (or simply explore my Last.fm profile), you could define it at least eclectic. Someone would say it’s schizoid.
Moreover, sometimes I might want to do different things from those of my friends. Negative recommendation could be part of the solution, but the underlying algorithm would just be the same.

So what would represent a good recommendation to me? Well, usually what is important to me is surprise. I like many different things. The parameters that show that I like maybe are originality, quality, …, but maybe they are simply unknown. Some people suggested a “Surpise me button” to accomplish this task. But it’s not that easy. Even if I know what I don’t like.
Hence, the final questions: how can I represent tastes of a user? How can I represent his or her reactions (or feelings) towards something he or she expects or does not? How can I represent what I would like recommendation on, and what I wouldn’t?

Stay tuned on RecSys conferences to see if someone comes out with an answer; my guess is that we’ll be seeing lots and lots of new recommender systems in the next years, and each one will be confronted with these issues.

Categories
HCI Web 2.0

Wolfram Alpha and user experience

There are a lot of ongoing discussions about the power of Wolfram Alpha. I think that most of these conversations are flawed because of the argument that Wolfram Alpha does not find you enough information.

I believe that the mistake here lies in the common way the press have introduced the service. Wolfram himself has not been clear enough, and when he has, the press has of course misinterpreted him. Wolfram Alpha is not a search engine.

Many articles and blogs have been issued on the topic will Wolfram Alpha be the end of Google?
The problem here is that the two services are actually very different. Wolfram Alpha is a self-defined computational knowledge engine, not a search engine like Google. Google is able to return millions of results for a single search, whilst Alpha returns a single, often aggregated, result about some topic.

Alpha is basically an aggregator of information. It selects information from different data sources and presents them to the user in a nice and understandable way. Google is more like searching in the phone directory. So you’re supposed to ask different questions to the two services.

Of course, Alpha makes mistakes. A curious example I’ve found is the search for the keyword “Bologna”. Bologna is primarily the name of a town in Northern Italy (the one in which I attended university); it is also the name of a kind of ham, commonly known as “Mortadella”, especially outside Bologna itself. In Milan, for example, Mortadella is commonly called Bologna.

Well, search for Bologna on Google, and compare it with results on Alpha.

Google will return mostly pages about the town of Bologna, and its football team, where Alpha will tell you nutritional information of Mortadella.

Is this a ‘mistake’? I think that the only mistake is in the expectations users have about Alpha. it yields results from a structured knowledge base, hence its index is not as general as the one of Google. Nonetheless, I believe that there’s at least a problem in the user interface that should be corrected: the search box. It’s exactly the same as Google’s, same shape, same height, same width. But is there any alternative way of presenting an answering engine on the Internet?

What I think is that more HCI research is needed to let users understand what are the goals and the capabilities of a service like Alpha. If users think of it as a search engine, it will never have success.

Just to have a hint of what Alpha should be about, try this search.