12 January 2007

Survey of Parser Usage

I know people use parsers a fair amount; I'm curious which parsers people use. Thus, a new poll :). What I'm interested in, in particular, is if you use a parser essentially as a preprocessing step before doing something "higher level." In other words, if you are Mark Johnson and you use the Charniak parser to build a better parser, this doesn't count (sorry, Mark!). I want to know if you use one to do, eg., summarization or MT or IE or something other than parser... (Also, I don't count shallow parsing in this context.) I look forward to responses!



Do you use a parser for your work?
Nope, no parsing in this neck of the woods!
Yes, one of the Brown parsers (Charniak, Johnson, etc.)
Yes, one of the Collins' parsers (Collins, Bikel, etc.)
Yes, MiniPar
Yes, one of the CoNLL dependency parsers (Nivre, McDonald, etc.)
Yes, a rule-based parser.
Yes, something else entirely.
  

14 comments:

Chris Brew said...

What would you like us to do with your survey if
we do work that compares multiple parsers? At
least on my machine, I can't do multiple selections
from your list

We use Minipar, Charniak, Bikel and Clark and Curran's CCG parser. Only the first and last are
feasible for the scale of corpus we think we want
to work with and the computational resources we can assemble. The others are too slow. At least, that's what our very preliminary studies seem to say.

hal said...

i guess i should have learned my lesson and allowed multiple choices ... i tried changing it to checkboxes, but the poll hosting site doesn't like that :(.

i'd say in general if you use more than one, choose the one you use most ... break ties randomly :).

Anonymous said...

Academically, I have used CnC, Minipar, Abney's chunker and RASP.

Commercially, however, it is often most advisable to stick with chunking ie use little or no syntactic processing at all, and only rely on IE.

Parsing research has made a lot of progress, but I'm worried that the community is still using section 21 of the WSJ corpus for benchmarking. In my humble opinion, it's about time for a fresh shipment of treebanks...

Regards,
Jochen

--
Jochen L Leidner
Linguit Ltd. (www.linguit.com)

Anonymous said...

Great survey! We've used Minipar, Conexor, Collins, Bikel, Charniak, and most currently the Stanford parser, mainly in the context of MT or MT-related research. With (~16-way) parallel parsing on a cluster we're finding Stanford quite workable for O(1M)-sentence data sets.

Anonymous said...

We use Link Grammar for speed reason (only have single or Dual CPU machines). Can anyone comment how it fares against the others that have been mentioned?

Min said...

alternately, i guess you can vote more than once. we happen to use lots of minipar and charniak's work here, just because we're most familiar with them. upgrading parsers usually takes time and doesn't happen in the middle of projects for us.

hal said...

i'm actually really surprised how many "other"s were selected! i suppose i should have included link, stanford and c+c, but they totally just slipped my mind.

i'm also shocked that rule-based parsers are beating out any of the family of statistical parsers (though if you lump collins + brown together, since they're very similar, this is no longer the case). you certainly wouldn't predict this based on what is published at .*ACL.

i personally typically use a variant of collins (implementation due to Radu Soricut) because Radu's impl is very fast and outputs a lot of extra stuff i like (like headedness). if that's too slow, i'll use minipar, but always feel like i'm fighting with it.

(although, truth-be-told, i typically use my own chunker instead of a parser, unless I really need full trees.)

Anonymous said...

It's also impressive the number of people who read the NLP blog and do not use parsing at all. Maybe parsing is not as popular in NLP as it used to be...

Anonymous said...

i totally agree with you Hal

Anonymous said...

Thanks for the nice post!

Anonymous said...

酒店經紀PRETTY GIRL 台北酒店經紀人 ,禮服店 酒店兼差PRETTY GIRL酒店公關 酒店小姐 彩色爆米花酒店兼職,酒店工作 彩色爆米花酒店經紀, 酒店上班,酒店工作 PRETTY GIRL酒店喝酒酒店上班 彩色爆米花台北酒店酒店小姐 PRETTY GIRL酒店上班酒店打工PRETTY GIRL酒店打工酒店經紀 彩色爆米花

taebo training said...

the truth is sometimes I use it but I do not really like it, is something I require, however, sometimes do not use it!

teeter hang ups said...

I would like to know the results of the survey, I think an interesting question. anyway I will find out .. continue to make, they are great ..

teething symptoms said...

clear that I use is dramatically useful, recommend it to those who have not tried it yet ..