Three.Fourteen

WSDM 2014 Summary, Opinions and Other Thoughts

Published on February 28, 2014.

It was a great attending WSDM this year, right at the heart of NYC! I could not possibly cover all that went down there in the three days of the conference, but would use this post to highlight sessions and talks that I attended and found particularly interesting.

The conference itself was a single track mixture of mostly long, 20 minutes talks and few shorter talks. The sessions revolved around Web Search, Advertising, Recommender Systems, Network Analysis, Language Analysis and Crowdsourcing.

The Web Search Session was very good with several papers that took new and interesting perspectives to the “old" problem of search. In [1] Demeester et al. modeled disagreement between user rankings of top results and used it to improve ranking of search results. Another novel perspective was introduced by Hassan et al. in [2], which learned to distinguish between long search sessions where the user is satisfied or struggling and unsatisfied. Li et al. looked in [3] at the phenomena of click spam: when spammer try to game search engines by automatically generating clicks on preferable results.

Personally, I found the Advertising sessions somewhat disappointing due to their (usual) focus on complex models that result marginal performance improvements, which rarely teach us something about users, ads or the search process. I get it - someone has to pay the bills and minor improvements in modeling ads may lead to huge improvements in revenue. Nevertheless, I would much rather see models that actually teach us something about the process than just building a bigger, stronger hammer.

The Log Analysis session was GREAT! I highly recommend looking at all papers, but just to pick a few: the best paper winner [4] by Lagun et al. identified common motifs in mouse movement over search results; [5] by Wang et al. learned jointly the user assignment to clusters and their resulting clicking behavior; [6] by Scaria et al. devised a game where participants have to get from a source wikipedia entry to a target entry by only following links and studied differences between successful and unsuccessful sessions.

The NLPand Topic Modeling sessions were pretty much standard LDA papers: we devised this graphical model, inferred it using Gibbs sampling and evaluated it by using perplexity on held-out. Two exception to that, can be found in [7] and [8]. Yu et al. used topics models in [7] to diversify E-commerce search results and evaluated its success in user satisfaction. Bi et al. found topic-specific experts in [8] and verified their results using external dataset.

From the Peer Production; Data Analysis session two papers stood out. First, the work by Abisheva et al. in [9] included a thorough analysis of the cross-section between YouTube and Twitter with interesting takes on the demographics of the platforms and identification of promotional account. Second, the paper by Di et al. in [10] looked at how image features such as the number images or their quality effect purchasing consumer decisions at eBay.

Have any questions or comments? leave a comment below or contact me @grinbergnir.