python - Find the length of a sentence with English words and Chinese characters -

May 15, 2014

the sentence may include non-english characters, e.g. chinese:

你好,hello world

the expected value length 5 (2 chinese characters, 2 english words, , 1 comma)

you can use chinese characters located in unicode range 0x4e00 - 0x9fcc.

# -*- coding: utf-8 -*- import re  s = '你好 hello, world' s = s.decode('utf-8')  # first find 'normal' words , interpunction # '[\x21-\x2f]' includes interpunction, change ',' if need match comma count = len(re.findall(r'\w+|[\x21-\x2]', s))  word in s:     ch in word:         # see https://stackoverflow.com/a/11415841/1248554 additional ranges if needed         if 0x4e00 < ord(ch) < 0x9fcc:             count += 1  print count

Search This Blog

DIs

python - Find the length of a sentence with English words and Chinese characters -

Comments

Post a Comment

Popular posts from this blog

css - Text drops down with smaller window -

php - cannot display multiple markers in google maps v3 from traceroute result -

php - Boolean search on database with 5 million rows, very slow -