使用 NLTK 庫

你可以在他們的 wiki 上找到有關 Python Natural Language Toolkit (NLTK)句子級別標記器的更多資訊。

從你的命令列:

$ python
>>> import nltk
>>> sent_tokenizer = nltk.tokenize.PunktSentenceTokenizer()
>>> text = "This is a sentence. This is another sentence. More sentences are better!"
>>> sent_tokenizer.tokenize(text)
Out[4]:
['This is a sentence.',
 'This is another sentence.',
 'More sentences are better!']