使用 NLTK 库

你可以在他们的 wiki 上找到有关 Python Natural Language Toolkit (NLTK)句子级别标记器的更多信息。

从你的命令行:

$ python
>>> import nltk
>>> sent_tokenizer = nltk.tokenize.PunktSentenceTokenizer()
>>> text = "This is a sentence. This is another sentence. More sentences are better!"
>>> sent_tokenizer.tokenize(text)
Out[4]:
['This is a sentence.',
 'This is another sentence.',
 'More sentences are better!']