Testing Chinese Text Segmentation Performance in jieba
This test suite evaluates the performance and functionality of the Jieba Chinese text segmentation library. It focuses on measuring processing speed and segmentation accuracy by timing the word cutting operation on input text files.
Test Coverage Overview
Implementation Analysis
Technical Details
Best Practices Demonstrated
fxsjy/jieba
test/test_file.py
import time
import sys
sys.path.append("../")
import jieba
jieba.initialize()
url = sys.argv[1]
content = open(url,"rb").read()
t1 = time.time()
words = "/ ".join(jieba.cut(content))
t2 = time.time()
tm_cost = t2-t1
log_f = open("1.log","wb")
log_f.write(words.encode('utf-8'))
log_f.close()
print('cost ' + str(tm_cost))
print('speed %s bytes/second' % (len(content)/tm_cost))