March 26-28, 2007

We currently have two tutorials scheduled for the afternoon of Saturday, March 25th. The exact time is TBD but it will be in the early afternoon. You must register for the conference and the tutorials to attend. The two tutorials will be one right after the other so you will be able to attend both.

Search and Discovery in the Blogspace

Gilad Mishne and Maarten de Rijke, University of Amsterdam

With the increasing number of blogs available, effective ways of searching and exploring blogs are becoming more and more important. The tutorial gives a survey of current activities in the area of blog search and discovery. It identifies important blog search and discovery tasks, describes blog-specific crawling, and then goes on to present effective ways of blog indexing and retrieval methods for addressing these tasks. We also cover several advanced topics in blog search, such as serving advertisements with retrieved results.

The tutorial is organized into three parts. The first part provides information retrieval background, defining concepts and methods used later in the tutorial, as well as surveying a number of publicly-available tools for the task. Additionally, it introduces search scenarios in blogs, deriving the user needs in this domain. The second part focuses on the areas where blog retrieval departs from other retrieval settings: the crawling process, and one search task characteristic of the blogspace identifying opinions and thoughts about a topic. This part also surveys the recent efforts on blog opinion retrieval at TREC 2006. The last part of the tutorial discusses two additional search tasks: blog (rather than blog post) search, and tasks related to the timelined nature of blogs. It also introduces some advanced topics in the area, such as contextual advertising for blogs, and search in social-networking sites.

Background of presenters

Gilad Mishne is a graduate student at the Information and Language Processing Systems group at the University of Amsterdam; he recently completed his dissertation on applying text analytics methods in the blogspace. Gilad holds a B.Sc in computer science from the Technion and an M.Sc from the University of Amsterdam; prior to his graduate studies, he has spent several years in the industry as a software engineer. He is a regular contributor to blog research venues, and co-organized the TREC 2006 Blog track.

Maarten de Rijke is a Professor of Web Information Processing and Head of the Information and Language Processing Systems group at the University of Amsterdam. His research focuses on modeling, development and evaluation of intelligent access to web information. In his recent work he has also addressed blog retrieval and discovery. He was a co-organizer of the TREC 2006 Blog track and of several tracks at CLEF. He regularly teaches (web) information retrieval courses at the bachelor and masters level, and has previously taught tutorials at various international summer schools as well as national and international graduate schools.

Spam in Blogs and Other Social Media

Pranam Kolari, Akshay Java, Tim Finin and Anupam Joshi University of Maryland, Baltimore County

Spam on the Internet dates back over a decade, with its earliest known appearance as an email about the infamous MAKE.MONEY.FAST. campaign. Spam has co-evolved with Internet applications and is now quite common on the World-Wide Web.

As social media systems such as blogs, wikis and bookmark sharing sites have emerged, spammers have quickly developed techniques to infect them as well. The very characteristics underlying the Web, be it version 1.0, 2.0 or 3.0, also enable new varieties of spam.

This tutorial will detail the problem of spam in social media, with an emphasis on spam in blogs. We will discuss different types of spam, the motivation for spammers and the seriousness and nature of the problem. We will then share some of our discussions with groups facing and tackling this issue. The second half of the tutorial will present an overview of spam detection and elimination efforts with an extensive survey of detecting blog spam. The tutorial will help researchers understand the unique aspects of spam in social media, recognize how it affects analytics and information extraction, and identify current research challenges. For practitioners, it will provide strategies for controlling the problem, and identify areas of collaboration across communities. Background of presenters

Pranam Kolari is a graduate student in the Dept of Computer Science at UMBC. His dissertation has focused on Spam Blog Detection, with tools developed in use both by academia and industry. He also maintains active research interest in Internal Corporate Blogs, Semantic Web and Blog Analytics. Pranam received a B.E. from Bangalore University and an M.S. from UMBC. He is active in the social media research and blogging community and blogs regularly at Ebiquity.

Tim Finin is a Professor in the Department of Computer Science and Electrical Engineering at the University of Maryland, Baltimore County (UMBC). He has over 30 years of experience in the applications of Artificial Intelligence to problems in information systems, intelligent interfaces and robotics and is currently working on software agents, the semantic web, and mobile computing. He holds degrees from MIT and the University of Illinois and has also held positions at Unisys, the University of Pennsylvania, and the MIT AI Laboratory.

Akshay Java is a graduate student in the Department of Computer Science at UMBC. His Ph.D. dissertation focuses on identifying influence and opinions in social media. His research interests include Blog Analytics, Information Retrieval and Semantic Web. Akshay has received a B.E from Mumbai University and M.S. from UMBC.

Anupam Joshi is a Professor of Computer Science and Electrical Engineering at UMBC. Earlier, he was an Assistant Professor in the CECS department at the University of Missouri, Columbia. He obtained a B. Tech degree in Electrical Engineering from IIT Delhi in 1989, and a Masters and Ph.D. in Computer Science from Purdue University in 1991 and 1993 respectively. His research interests are in the broad area of networked computing and intelligent systems. He currently serves on the editorial board of the International Journal of the Semantic Web and Information Systems.