Ratings, Reputation and Incentive Design
An Interview with Dr. Paul Resnick, University of Michigan School of Information
We take a short, but deep, dive with Dr. Paul Resnick - a leading researcher (papers) in incentive based design, recommender, and reputation systems. For our tip, Scott describes concrete steps for effectively using his book Building Successful Online Communities: Evidence-Based Social Design, which is a MUST READ for all online social product designers.
- Dr. Resnick’s Home Page
- Book: Building Successful Online Communities: Evidence-Based Social Design
- Much of the book’s research was done using MovieLens - a movie ratings site.
Transcript
Randy Welcome to Episode 9 of the Social Media Clarity Podcast. I’m Randy Farmer.
Marc I’m Marc Smith.
Scott I’m Scott Moore.
Paul I’m Paul Resnick from the University of Michigan.
Randy For today’s news, we’d like to announce a few of our upcoming guests. In early February, we talked to danah boyd about her new book “It’s Complicated: The Social Lives of Networked Teens.” After that, we speak to Howard Rheingold, critic, writer and teacher on Social, Cultural and Political Implications of Modern Media. Most of all, we’d like to call in our audience for suggestions for topics and guests you would like us to interview. Who do you think in social media has a voice that should be heard on our show?
Marc Today, we’re going to welcome Paul Resnick of the University of Michigan School of Information and co-author of Building Successful Online Communities Evidence-Based Social Design. Thank you for joining us, Paul.
Paul It’s good to talk to you again.
Randy There’s a lot of confusing terminology in this area. Could explain the difference between reputation systems, recommender systems, and gamification.
Paul Sure. You will see the reputation system used in some ways that don’t match my terminology. Recommender systems are things that recommend items to people often based on feedback from other people. Think of Netflix where people rate movies and those ratings are used to recommend movies to other people. The idea of a reputation system is rating systems applied to people or organizations or anybody who might change their behavior in response to the rating. A movie or a book doesn’t change its behavior in response to the ratings but a hotel might and an eBay seller might. Those cases, I would call them reputation systems.
Many kinds of gamification involve keeping score and giving people badges or prizes. I think of that as a different idea. It’s not about using performance or feedback from some people to influence other people. The idea of incentives are things that motivate people to do things and they maybe internal or external motivations. The other type of an incentive is money. If you do this, we’ll give you some money but that’s not the only kind of incentives that people respond to. People also want love and respect and lots of other things. It basically means motivation. The idea is to design information systems around creating incentives for the people who are using the systems to use it in socially beneficial ways, ways that make the system work better for everybody.
Marc Paul, can you tell us a little bit more about reputation systems and incentive-centered design?
Paul Well, one classic one would be a feedback form at eBay. Each seller get some feedback from buyers after they do a transaction and other buyers then use that information to decide do I want to buy from this seller as this guy says. Creating the right incentives for the seller to do a good job in the first place because they’re anticipating that they’re going to be rated afterwards.
One of my colleagues here, Yan Chan, has been citing Kiva.org where people are making micro loans to help small businesses expand and they’ve been analyzing what kind of incentives they have provided to people to loan more. One of those is being part of a team. They analyze whether when they get people to be part of a team and compete against other teams. Does that get people to get more loans?
Scott The idea of incentive-centered design is new to me. Where can people learn more about incentive-centered design?
Paul Last year, Bob Kraut and I came up with a book called “Building Successful Online Communities: Evidence-Based Social Design” where we tried to codify a lot of the research results on running online communities well. Any ideas from incentive-centered design became design clients in that book. If you want to get people to maybe to do things in an online community, it helps to ask them in a certain way. It helps to emphasize the unique contributions that those people can make.
In other parts of the book, we looked at what they’re trying to keep trolls out, what kinds of incentives can be put in place that’s going to make it unattractive for them to do that. There’s also lots of academic research that we referenced in that book. People who want to go directly to do even more academic take on some of these incentives, there’s an ACM conference on Economic Computation, EC, that covers things like auction mechanisms and how to design kidney exchanges so that people would most likely reveal the right information and the most matches will get made for kidney transplants and things like that.
Randy One of the great things about your book is that many of the claims present that there are trade-offs that we want to encourage certain kinds of contributions, you can do this thing but it might have a side effect of decreasing another thing. For example leader boards encourage certain kinds of behaviors but discourage other kinds. This is particularly interesting when you start to talk about an area you’ve written extensively about, manipulation resistance. When people have online incentive systems, there are external incentives to manipulate the scores. Would you like to first unpack what score manipulation is and some of your thoughts on manipulation resistance?
Paul Whenever we keep score, if scores matter, people do things to try to improve their scores. Some of the things they do are exactly the things we were trying to create incentives for them to do. We wanted them to make more contributions to Wikipedia, to carefully research things and edit the pages. And in order to get a high score, they went and hit exactly that.
Unfortunately, there are often ways you can improve your score that are not exactly the behaviors that we were trying to encourage. That’s what sometimes called gaming the system or manipulation. In the context say of Trip Advisor or Yelp where you have business that are getting rated, one way to get a high rating is to give really great service at a really low price. That’s a good outcome of a rating mechanism, but if the way you’re keeping score is making the average of ratings of all the people who submitted ratings, you can put in a whole bunch of fake ratings or pay workers a nickel to put in fake ratings. Or make it a buck, they’ll put in thousand of them for you. That’s what I mean by manipulation.
Manipulation resistance is people who are trying to run these systems and make the score be informative and have the right incentive effects or trying to bring countermeasures to resist the effect of attempting to manipulate the score.
Randy Yelp recently implemented rejecting completely valid reviews because of their attempt to prevent what is arguably maybe not manipulation. In my case of my personal dentist asked us to write a review. Because they asked all of their customers at once, apparently they tripped the manipulation resistance because we all wrote reviews of various qualities and various ratings and all of them were rejected and put on hold because they all came in at once. They’re using some very simple resistance mechanisms which are actually probably discouraging the creation or trust of new users.
Paul One way to try to think about the manipulation is a cat and mouse game between manipulators and the system architects. The manipulators get a little more sophisticated. The countermeasures get a little more sophisticated. At each step, you’re going to find countermeasures are not going to be 100% effective and they’re going to make both type one and type two errors. They’re going to allow some things through that are fake and they’re going to throw out some things that are real. Question is how many mistakes of each kind they’re going to make?
You’ve just described a situation where they’re throwing out some information that turned out was actually just fine because they had an algorithm. Its countermeasure was if we get too many ratings all at once for a business that didn’t have very many ratings before, that’s probably a signature manipulation.
There are two things going on. Even if Yelp understood what actually happened, would they want, as a matter of policy, do not allow those reviews? If they were able to have a person scrutinize everything and would be happy with it, they don’t have a way to tell and their algorithm is, in some sense, making the classification error there.
My concern is mistakes that are not a matter of policy but a matter of an inability to tell the stuff that they do want from the stuff that they don’t want. There’s always going to be more sophistication from the attackers followed by more sophistication from the defenders. I’m worried about what the end game of this is because each time the attackers get more sophisticated, they really look more and more like real users. People who are contributing to good reviews are not contributing a lot of reviews. So, Yelp is not going to have a lot to go on.
When Randy submits a review, if he’s only submitted three of them ever in his life, they’re going have trouble ever deciding to accept a review from someone like that. My concern is that I think we may be heading to a situation where the services are going to have to throw away a lot of good information because they’re not going to be able to tell whether this is just somebody who only does three reviews.
Scott I often find on Yelp and Travel Advisor and any of these review sites, very often the opinions don’t match me for some reason or another. We don’t have the same taste. We don’t have the same level of desire for ambiance or what a quality hotel is. Are we creating our own problems by aggregating so many reviews together without any additional information?
Paul Well, this is the question of personalization. Is a recommendation that this is a four and a half star movie, good enough or do we need to know that a four and a half stars for people like you? Is this restaurant a good restaurant or is it a good restaurant for people like you and maybe most of the people who go to it love it but it turns out that it’s a really different demographic. When we find out that there are differences in taste, it may be that we need to introduce personalization or we just need to have multiple different sites.
The more that we fragment the audience, the more people’s information that we need in order to make a prediction. So, if we needed 100,000 people who are reviewing a lot in order to make aggregate scores but now we have to divide that into 18 different demographic groups. We might need 18 times as many people participating. Then you might start to need the long tail of contributors and get back to the problem that we have trouble of taking advantage of the long tail.
Marc This is because the long tail is in some way more subject to fraud, manipulation and in some ways even if 99% of it is perfectly good, a small fraction of it can easily be fabricated. If we then consume that as part of our information resources, we’re going to end up with really skewed results.
Paul Yeah, I think that’s a great analysis, Marc. These tools are going to keep working in those cases where we have enough signal about individual contributors to be able to assess them. What is calling in to question is the ability to make use of a little information from the long tail of contributors.
Certain kinds of things where we assume that the value is coming from the long tail. You’ve given the argument of Wikipedia and that perhaps most of the contribution came from big contributors. Then if you really want to make Wikipedia better, you got do things that [make] those people happy.
For things like rating systems, five-star ratings of Netflix movies, part of the whole idea was wow, it’s not just going to be the film critics anymore. We can make use of everybody. If it turns out that actually no, we can’t make use of everybody, we can only make use of people who rate at least 1,200, that’s going to be bad. I think that will be a disappointment for what we had hoped for.
Randy We’d like to thank our guest, Paul Resnick, for speaking with us today. It was great having you on, Paul.
Paul Thank you. It was great to be on your podcast. Hope to talk to you again.
Marc Thanks, Paul.
Scott Thanks so much, Paul.
Scott Scott here with your actionable tip for this episode. How to get the most from Robert Crowd’s and Paul Resnick’s book “Building Successful Online Communities”. The book has a consistent structure which makes it a practical reference for community builders. I use the book when I consult to brainstorm tactics or to make sure I didn’t overlook something when designing my own solutions for communities.
There are only five chapters broken into broad categories of challenges: Encouraging contributions, encouraging commitment to the community, regulating behavior, specific challenges of new comers, and starting new online communities. Each chapter offers a few paragraphs of description of a social behavior under certain circumstances followed by one or two design claims derived from the previous descriptions. Then a new set of paragraphs followed by the related design claim and so on. At the end of each chapter, is a summary of design choices grouped by type with the design claims that relate to that type.
The first way to use the book is to scan the chapter that matches the community situation you are facing paying attention to the design claims. When you find a design claim that resonates or seems like a plausible tactic, go to the previous paragraphs to understand better the evidence and pitfalls of the design claim.
Another way to use the book is to go to the end of the relevant chapter and skim the design choices. Then cross-reference the corresponding design claims to see which ones might work for you. Remember, not every design claim applies to every community situation so you may come away with a different set of ideas depending on your community’s unique needs.

