This is the home page of the ParsCit project, which performs reference string parsing. It is architected as a supervised machine learning procedure that uses Conditional Random Fields as its learning mechanism. You can download the code below, parse strings online, or send batch jobs to our web service (coming soon!). The code contains both the training data, feature generator and shell scripts to connect the system to a web service (used here too).
Some definitions (thanks to Robert Dale):
This project deals with the problem of parsing the reference strings and parsing the metadata information found in the title page of the document. The first task is handled by a module with the project namesake, ParsCit, and the second task by a separate module ParsHed. Other projects related to ParsCit (some here in WING, some elsewhere) with identifying and linking citations to reference strings).
You can download the open-source code for ParsCit here (coming
soon). The source requires you to re-compile the CRFPP source code
and assumes that perl is installed on your system and can be invoked
using perl (must be in your path).
More NLP services are now being made available on the web. Following this trend you can send your plain text citations to use via our web service. We will parse these for you free of charge (as and when time and processing power allows, these processes are done with lower priority).
N.B. We keep logs of what's parsed in these demos, to improve the accuracy and productivity of ParsCit. If you'd like these to be kept private or you find you use this service a lot, why not install a local copy of ParsCit for yourself? If you do, please let us know where you are so we acknowledge you here and can re-direct some traffic your way.
./ParsCitClient.rb ~/public_html/samples/E06-1050.txt
N.B.: We keep logs of what's parsed in these demos, to improve the accuracy and productivity of ParsCit. If you'd like these to be kept private, why not install a local copy of ParsCit for yourself?
You can also run ParsCit directly in your browser. The form below submits your text input (after suitable cleaning) to the ParsCit service to parse the input file or strings.
Demo #1: Parsing the citation contexts and the reference strings from a whole text file
Demo #2: Parsing individual strings only
International Referreed Conference Publications:
Others:
A list of common problems with ParsCit. If you find problems, email the lead developer at <kanmy@comp.nus.edu.sg>. Please use the subject "[ParsCit]" to ensure that it reaches our attention. If you have hand-corrected tagged data that you don't mind providing us, we can use that to further improve ParsCit's extracting capabilities. Nevertheless, there are problems with the output occasionally. Below are some common problems people have encountered.
Baltes, Paul, Ursula Staudinger, Ulmann Lindenberger (1999): Lifespan psychology: theory and application of intellectual functioning; in: Annual Review of Psychology, 50, 471-507ParsCit's post processing step may not detect and deal with these problems reliably. We're working to fix these too.
$ ../../bin/tr2crfpp.pl tagged_references.txt > parsCit.train.data
$ ../crf_learn parsCit.template parsCit.train.data model
$ mv model ../../resources/parsCit.model
The first command creates the input feature file that crfpp uses from the training data. The second creates the model using the crf_learn command. You can then move the model file to the resources/ subdirectory where it can be utilized. To replace the default model that comes with ParsCit, just execute the final command.
ParsCit owes its continued maintenance and support from its user base. Here we'd like to thank them for their help. Thanks to Artemy Kolchinsky for fixes in Preprocess.pm (v090625). Thanks to Matteo Romanello for the humanities training datasets. Thanks to Dain Kaplan for helping us fix the Preprocess.pm bug. Thanks to Ayeh Bandeh-Ahmadi for correcting the warning in parseRefString.pl. Thanks to Nick Friedrich and Jöran Beel of scienstein.org for all fixes in the v081201 version of ParsCit.
Other, open-source citation parsers:
Other related links. Contact Min below to get your other related software listed here. Thanks!