Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
465 views
in Technique[技术] by (71.8m points)

json格式数据如何提取指定中文字符串。

本人新手,想编写一个小程序解决工作中的问题,但是在两个地方实际结果及调试结果存在很大差异,百度不得其解,特来请教!
程序的功能很简单,就是把快递单号利用requests提交到快递100的查询网站,然后利用正则表达式提取其中的特定中文字符串。如果存在就显示pass,不显示的话就讲条码保存到txt文本里面。代码如下:

  **source = "签收"
    temp = source.decode('utf8')
    xx = ur'[^x00-xff]'
    pattern = re.compile(xx)**
**def query_net(barcodes):
    url = "http://www.kuaidi100.com/query?type=shentong&postid="
    for barcode in barcodes:
        new_url = url + barcode
        html = requests.post(new_url).content
        print barcode  + "" +"is checking"
        result_html =json.loads(html)
        dic_123 = result_html["data"]
        if not dic_123 :                #依据返回值判定是否为数据错误
            print "data error"
        for key in dic_123 :
            print key["context"]
            key_new = str(key["context"])
            results = pattern.findall(temp)
            for result in results:
                print result
            else :
                 save_file(barcode)**

快递100返回的诗句是json格式,按照设想,上述代码可以完全匹配到预期的结果,但是实际不能匹配到指定字符串的条码并不能被保存起来。猜想问题因该是在中文字符串的正则表达式这块,单不知道如何解决。


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

匹配汉字可以试下这个正则表达式:

([u4E00-u9FA5]|[uFE30-uFFA0])+

[u4E00-u9FA5]表示汉字,[uFE30-uFFA0]表示全角


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...