Another aspect of my conversation dealt with inferred ratings, a problem I've crossed before in other areas. There are two primary cases in which this arises: censored data and self-selection bias.
In the first case of censored data, a problem is caused by the ratings system not eliciting useful responses. An example is a system in which most people rate something a 5, slightly less people give it a 4, and even less for 3, 2, and 1. This data is "censored" because the true opinion lies somewhere off the scale, and all we have managed to measure is the distribution of people who do not think that a 5 too high a score. A proper ratings system will have granularity about the true response, not simply on one side of it.
In the second case, self-selection, people are likely to express their opinion when it represents an extreme, since they believe (somewhat correctly, though not necessarily) that their extreme opinion contains more information. Thus, if you collect satisfaction surveys, the ones with the most information will be the ones from people who were completely unsatisfied or happy beyond expectations, since they feel the need to express exactly where those surprising emotions come from. Again, you have granularity in your tails but not around your true result.
The particular example we discussed was teacher ratings, which uses a "star + comment" system. In this instance, students had to fill out a survey in order to see their grades, which garuntees a high participation rate. However, students took the path of least resistance, and while they all entered stars, only a few filled out comments.
We hypothesized that the average rating inferred from the stars would be much higher than that inferred from the comments, since the comments were more likely to be negative opinions which a student felt a need to express. (They could also be positive, but I think negatives likely outweigh in this case). I hope to receive data to indicate this is the case.
The question, finally, is this: is it possible to infer the "correct" rating (as implied by the stars) from the rating implied only by the comments?
Let us make a large assumption and say that we can put comments into 1-5 buckets, via some method of text analysis (perhaps LSA or a similarity metric, or just by counting good/bad words). If my hypothesis is correct, it is not enough to simply average the implied numerical ratings, since the comments are likely skewed negative. We could apply a Bayesian analysis, but what is the correct prior assumption? Certainly not a naive mean.
I think there are two additional factors to account for. On the one hand, students are more likely to express opinions which they infer are different from the average opinion. In an extreme case, getting a few comments ranking a teacher as a 1 (and no other students making comments) might actually be indicative of a very good teacher, since those students thought themselves sufficiently different from the "true" result to write about it, whereas no other students felt that way. In other words, this accounts for the idiosyncratic component of comment writing: the more you write, the more different (or informative, to put it a better way) you believe your opinion to be.
The second factor, which works in opposition to the first, is that students will probably write more when they believe they are correct. This is a systemic factor, common to all comments conditional on their having been written in the first place.
In conclusion, I don't think this is really such a difficult problem - I have a unique identifier (the implied number), an idiosyncratic or between-groups source of variance, and a systemic or across-groups source of variance. There should be a model (indeed, a linear model) that can account for these biases. The only trick would be training it and the confidence to trust its output.