WebVision 2012: The ECCV Workshop on Computer Vision for the Web 12th and 13th October, 2012, Florence, Italy http://research.microsoft.com/~manik/events/WebVision2012/index.html The Internet is increasingly throwing up challenging computer vision applications that are not only commercially important but also have fundamental research questions embedded in them. These problems lie at the intersection of computer vision, machine learning, data mining, information retrieval, natural language processing, game theory, human computer interaction, etc. Our objective, in this workshop, is to get researchers from academia and industry together so as to identify exciting applications of computer vision on the Web, highlight the multi-disciplinary nature of the research problems they spawn and discuss how computer vision techniques can be brought to bear to improve the state-of-the-art. Internet computer vision has typically been considered from the perspective of data driven vision or of applying classical vision techniques at Web scale. We would like to broaden the ambit to include vision applications which necessitate exploiting Web content, Web structure and user information. For instance, keyword based image search, rich advertising in terms of both rich content and rich ads, visual analysis in social networking and building visually rich narratives all fall within this domain. It is critical to embrace such applications as they have the potential of becoming disruptive in the future. Furthermore, they open up new frontiers in computer vision research, such as those arising from the joint modeling of users, visual content and non-visual information based on text, links and location. The Workshop will consist of a series of invited talks surveying the state-of-the-art and presenting novel research on Internet computer vision problems. Attendees will benefit from the diverse perspectives and expertise that the speakers bring from the fields of computer vision, machine learning, computational advertising, web search and multi-modal user interaction. The talks will not only cover topics in data driven vision and large scale learning but also focus on computer vision research opportunities in keyword based image search, joint image and text modelling, rich advertising, the illustration of textual corpora and the visual aspects of social network analysis. The talks will present research advances in core vision techniques as well as approaches for leveraging non-visual meta-data and user information for solving visual tasks on the Web. Speakers Rakesh Agrawal (Microsoft Research Silicon Valley) Serge Belongie (UCSD) Samy Bengio (Google) Shih-Fu Chang (Columbia) Trevor Darrell (Berkeley) Rob Fergus (NYU) Brian Kulis (Ohio State) Gert Lanckriet (UCSD) Fei-Fei Li (Stanford) Louis-Philippe Morency (USC) S. Muthu Muthukrishnan (Rutgers) Steve Seitz (UW) Rick Szeliski (Microsoft Research Redmond) Manik Varma (Microsoft Research India) Jason Weston (Google)