Analysing positions and arguments - The Google data prediction API21. May 2010 – 10:08 by Bengt Feil (TuTech Innovation GmbH) |
Two of the major challenges for eParticipation today are scale (what to do if there are 100.000 contributions) and the problem of quantifying the positions in qualitative discussions (clearly knowing who supports what etc.). Automatic analysis and categorization of contributions could be a possible solution to these problems or at least a valuable support to human moderators and facilitators. The challenge of reliable automatic argument analysis has not been solved yet and a perfect solution might be out of reach for a long time, but with the announcement of the data prediction API at the Google I/O conference yesterday a workable solution could be available soon.
The data prediction API is a service that is able to categorize random text based on how it has been trained with known categorized data. For example: If the service was trained that “This is an english sentence” is “English” and that “La idioma mas fina” is “Spanish” it will be able to determine that “Qué Hay De Nuevo” is also “Spanish”. Of course this is a very simple example but the service is potentially able to categorize complex texts based on the training it has received with known data. Details about the process can be found in the developers guide (warning technical content).
In the field of eParticipation this API could be used to categorize arguments and contributions made by participants. The service could for example be trained with 1.000 text based contributions in several participation efforts related to urban planning which are categorized for being “positive” or “negative” in their tone and position to the issue at hand. If a new participation effort in urban planning is started all its contribution then can be automatically categorized as being “positive” or “negative” based on the data used for training the API.
Of course this vision is not without its problems: Even if the data prediction API works really well (and has been trained with enough and well categorized data) there is still the need for a human to facilitate a discussion and to decide on edge cases. Nonetheless this kind of automatic analysis could be a valuable support for moderation.
However good or bad Google´s particular try of automatic data analysis might be, the need to further advance on this field seems obvious especially in the age of distributed discussions in social networks and the social web in general. I will keep an eye on the development of this service and give it a try as soon at is available to a wider audience.
Tags: API, data prediction, Google, inenglish