斯坦福 CoreNLP

Created: November-22, 2018

Stanford CoreNLP 是一種流行的自然語言處理工具包，支援許多核心 NLP 任務。

要下載並安裝該程式，請下載發行包並在類路徑中包含必要的*.jar 檔案，或者從 Maven 中心新增依賴項。有關詳細資訊，請參閱下載頁面。例如：

curl http://nlp.stanford.edu/software/stanford-corenlp-full-2015-12-09.zip -o corenlp.zip
unzip corenlp.zip
cd corenlp
export CLASSPATH="$CLASSPATH:`pwd`/*

執行 CoreNLP 工具有三種支援的方法：（1）使用基本完全可自定義的 API ，（2）使用 Simple CoreNLP API，或（3）使用 CoreNLP 伺服器。下面給出每個的簡單使用示例。作為一個激勵用例，這些例子將用於預測句子的句法分析。

CoreNLP API

public class CoreNLPDemo {
  public static void main(String[] args) {

    // 1. Set up a CoreNLP pipeline. This should be done once per type of annotation,
    //    as it's fairly slow to initialize.
    // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution 
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize, ssplit, parse");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

    // 2. Run the pipeline on some text.
    // read some text in the text variable
    String text = "the quick brown fox jumped over the lazy dog"; // Add your text here!
    // create an empty Annotation just with the given text
    Annotation document = new Annotation(text);
    // run all Annotators on this text
    pipeline.annotate(document);

    // 3. Read off the result
    // Get the list of sentences in the document
    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
    for (CoreMap sentence : sentences) {
      // Get the parse tree for each sentence
      Tree parseTree = sentence.get(TreeAnnotations.TreeAnnotation.class);
      // Do something interesting with the parse tree!
      System.out.println(parseTree);
    }

  }
}

簡單的 CoreNLP

public class CoreNLPDemo {
  public static void main(String[] args) {
    String text = "The quick brown fox jumped over the lazy dog");  // your text here!
    Document document = new Document(text);  // implicitly runs tokenizer
    for (Sentence sentence : document.sentences()) {
      Tree parseTree = sentence.parse();  // implicitly runs parser
      // Do something with your parse tree!
      System.out.println(parseTree);
    }
  } 
}

CoreNLP 伺服器

使用以下命令啟動伺服器（適當地設定類路徑）：
```
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer [port] [timeout]
```
獲取給定註釋器集的 JSON 格式輸出，並將其列印到標準輸出：
```
 wget --post-data 'The quick brown fox jumped over the lazy dog.' 'localhost:9000/?properties={"annotators":"tokenize,ssplit,parse","outputFormat":"json"}' -O -
```
要從 JSON 獲取我們的解析樹，我們可以將 JSON 導航到 sentences[i].parse。