natural language processing blog: 2009/05

Kevin Duh not-so-recently asked me to write a "position piece" for the workshop he's co-organizing on semi-supervised learning in NLP. I not-so-recently agreed. And recently I actually wrote said position piece. You can also find a link off the workshop page. I hope people recognize that it's intentionally a bit tongue-in-cheek. If you want to discuss this stuff or related things in general, come to the panel at NAACL from 4:25 to 5:25 on 4 June at the workshop! You can read the paper for more information, but my basic point is that we can typically divide semi-supervised approached into one lump (semi-supervised) that work reasonably well with only labeled data and are just improved with unlabeled data and one lump (semi-unsupervised) that work reasonably well with only unlabeled data and are just improved with labeled data. The former are typically encode lots of prior information; the latter do not. Let's combine! (Okay, my claim is more nuanced than that, but that's the high-order bit.)

It's past most reviewing time (for the year), so somehow conversations I have with folks I visit tend to gravitate toward the awfulness of reviewing. That is, there is too much to review and too much garbage among it (of course, garbage can be slightly subjective). Reviewing plays a very important role but is a very fallible system, as everyone knows, both in terms of precision and recall. Sometimes there even seems to be evidence of abuse.

But this post isn't about good reviewing and bad reviewing. This is about whether it's possible to cut down on the sheer volume of reviewing. The key aspect of cutting down reviewing, to me, is that in order to be effective, the reduction has to be significant. I'll demonstrate by discussing a few ideas that have come up, and some notes about why I think they would or wouldn't work:

Tiered reviewing (this was done at ICML this year). The model at ICML was that everyone was guaranteed two reviews, and only a third if your paper was "good enough." I applaud ICML for trying this, but as a reviewer I found it useless. First, it means that at most 1/3 of reviews are getting cut (assuming all papers are bad), but in practice it's probably more like 1/6 that get reduced. This means that if on average a reviewer would have gotten six papers to review, he will now get five. First, this is a very small decrease. Second, it comes with an additional swapping overhead: effectively I now have to review for ICML twice, which makes scheduling a much bigger pain. It's also harder for me to be self-consistent in my evaluations.
Reject without review (this was suggested to me at dinner last night: if you'd like to de-anonymize yourself, feel free in comments). Give area chairs the power that editors of journals have to reject papers out of hand. This gives area chairs much more power (I actually think this is a good thing: area chairs are too often too lazy in my experience, but that's another post), so perhaps there would be a cap on the number of reject without reviews. If this number is less that about 20%, then my reviewing load will drop in expectation from 5 to 4, which, again, is not a big deal for me.
Cap on submissions (again, a suggestion from dinner last night): authors may only submit one paper to any conference on which their name comes first. (Yes, I know, this doesn't work in theory land where authorship is alphabetical, but I'm trying to address our issues, not someone else's.) I've only twice in my life had two papers at a conference where my name came first, and maybe there was a third where I submitted two and one was rejected (I really can't remember). At NAACL this year, there are four such papers; at ACL there are two. If you assume these are equally distributed (which is probably a bad assumption, since the people who submit multiple first author papers at a conference probably submit stronger papers), then this is about 16 submissions to NAACL and 8 submissions to ACL. Again, which is maybe 1-4% of submitted papers: again, something that won't really affect me as a reviewer (this, even less than the above two).
Strong encouragement of short papers (my idea, finally, but with tweaks from others): right now I think short papers are underutilized, perhaps partially because they're seen (rightly or wrongly) as less significant than "full" papers. I don't think this need be the case. Short papers definitely take less time to review. A great "short paper tweak" that was suggested to me is to allow only 3 pages of text, but essentially arbitrarily many pages of tables/figures (probably not arbitrarily, but at least a few... plus, maybe just make data online). This would encourage experimental evaluation papers to be submitted as shorts (currently these papers typically just get rejected as being longs because they don't introduce really new ideas, and rejected as shorts because its hard to fit lots of experiments in four pages). Many long papers that appear in ACL could easily be short papers (I would guesstimate somewhere around 50%), especially ones that have the flavor of "I took method X and problem Y and solved Y with X (where both are known)" or "I took known system X, did new tweak Y and got better results." One way to start encouraging short papers is to just have an option that reviews can say something like "I will accept this paper as a short paper but not a long paper -- please rewrite" and then just have it accepted (with some area chair supervision) without another round of reviewing. The understanding would have to be that it would be poor form as an author to pull your paper out just because it got accepted short rather than accepted long, and so authors might be encouraged just to submit short versions. (This is something that would take a few years to have an effect, since it would be partially social.)
Multiple reviewer types (an idea that's been in the ether for a while). The idea would be that you have three reviewers for each paper, but each serves a specific role. For instance, one would exclusively check technical details. The other two could then ignore these. Or maybe one would be tasked with "does this problem/solution make sense." This would enable area chairs (yes, again, more work for area chairs) to assign reviewers to do things that they're good at. You'd still have to review as many papers, but you wouldn't have to do the same detailed level of review for all of them.
Require non-student authors on papers to review 3 times as many papers as they submit to any given conference, no exceptions ("three" because that's how many reviews they will get for any given paper). I know several faculty to follow the model of "if there is a deadline, we will submit." I don't know how widespread this is. The idea is that even half-baked ideas will get garner reviews that can help direct the research. I try to avoid offending people here, but that's what colleagues are for: please stop wasting my time as a reviewer by submitting papers like this. If they get rejected, you've wasted my time; if they get accepted, it's embarrassing for you (unless you take time by camera ready to make them good, which happens only some of the time). Equating "last author" = "senior person", there were two people at NAACL who have three papers and nine who have two. This means that these two people (who in expectation submitted 12 papers -- probably not true, probably more like 4 or 5) should have reviewed 12-15 papers. The nine should probably have reviewed 9-12 papers. I doubt they did. (I hope these two people know that I'm not trying to say they're evil in any way :P.) At ACL, there are four people with three papers (one is a dupe with a three from NAACL -- you know who you are!) and eight people with two. This would have the added benefit of having lots of reviews done by senior people (i.e., no crummy grad student reviews) with the potential downside that these people would gain more control over the community (which could be good or bad -- it's not a priori obvious that being able to do work that leads to many publications is highly correlated with being able to identify good work done by others).
Make the job of the reviewer more clear. Right now, most reviews I read have a schizophrenic feel. On the one hand, the reviewer justifies his rating to the area chair. On the other, he provides (what he sees as) useful feedback to the authors. I know that in my own reviewing, I have cut down on the latter. This is largely in reaction to the "submit anything and everything" model that some people have. I'll typically give (what I hope is) useful feedback to papers I rate highly, largely because I have questions whose answers I am curious about, but for lower ranked papers (1-3), I tend to say things like "You claim X but your experiments only demonstrate Y." Rather than "[that] + ... and in order to show Y you should do Z." Perhaps I would revert to my old ways if I had less to review, but this was a choice I made about a year ago.

I'd be interested to hear if others have additional "small changes" ideas. There are "large delta" ideas, such as Fernando's "everything is a journal" model, which I actually like, but is likely to be hard to implement because it's hard to make sweeping changes to a system (though VLDB -- or was it SIGMOD -- managed to do it).

I actually think that together, some of these ideas could have a significant impact. For instance, I would imagine 2 and 4 together would probably cut a 5-6 paper review down to a 3-4 paper review, and doing 6 on top of this would probably take the average person's review load down maybe one more. Overall, perhaps a 50% reduction in number of papers to review, unless you're one of the types who submits lots of papers. I'd personally like to see it done!

natural language processing blog

30 May 2009

Semi-supervised or Semi-unsupervised? (SSL-NLP invited papers)

29 May 2009

How to reduce reviewing overhead?

About Me

Labels

My Blog List

Blog Archive