您当前的位置：首页 > 计算机 > 精彩资源

自制汉字字库，识别汉字（一）（数字篇）

时间：04-11来源：作者：点击数：

识别0~9的数字（以宋体-常规-小五举例）

1、数字在显示屏的效果。

打开一个txt文件，选择“宋体-常规-小五”，“Print Screen”然后用画图板打开，放大到最大后可以看到图片，，可以看到每个数字都是由5x8的黑白像素点组成，只截取数字部分，得到59x8的图片：。

2、生成5x8数字子模（字典），只有10个数字，可以用眼睛数的方式来得到，但后期生成字母（26个）、汉字（常用3500个）时用眼睛看的方式以及不实际了。

所以直接用程序自动化生成的方式：

a、安装PIL模块，用于打开png图片，读取像素点信息

b、5x8就是40个点，直接用40个位来表示，黑色（0,0,0）用1，白色（255,255,255）用0，40个位直接用一个数字表示即可（也可以用16进制数会少一些），全1也就是0xFFFFFFFFFF = 1099511627775

c、直接复制代码如下：

#!/usr/bin/python
# -*- coding: gb18030 -*-

from PIL import Image

def find_color_rect(img, (x,y), width, height, color=(255,255,255)):
    ret_find = 1 << (width*height)
    ret_find = 0
    i = 0
    point_color = 0,0,0
    while i < width:
        ret_find <<= height
        j = 0
        while j < height:        
            point_color = img.getpixel((x+i, y+j))
            if point_color[0] == color[0] and point_color[1] == color[1] and point_color[2] == color[2]:
                ret_find |= 1 << (height-1 - j)
            j += 1        
        i += 1
    return ret_find

def create_dic(img,w,h,ch_color,txt_name):
    number = 1 << (w * h)
    ch_count = 0
    ch_dic_str = "gCH_dic = { "
    ch_dic_str += "\n"
    i, j = 0, 0
    while i <= img.size[0] - w:            
        number = find_color_rect(img, (i,j),w,h,ch_color)
        if number != 0:
            #print txt_name[ch_count], number
            ch_dic_str += "%d" %number
            ch_dic_str += ":"
            ch_dic_str += "\"%s\""%(txt_name[ch_count])
            ch_dic_str += ","
            ch_dic_str += "\n"
            i += w + 1
            ch_count += 1
        else:
            print "end"
            break
    ch_dic_str += "}"
    print ch_dic_str
    
    
if __name__ == "__main__":

    img_5x8 = Image.open("digit_5x8.png") #digit_5x8.png 就是处理过后59x8的图片
    
    create_dic(img_5x8, 5,8, (0,0,0),"0123456789")

d、得到一个5x8的数字子模（字典）

gCH_dic = { 
543338627454:"0",
1107230976:"1",
427442147681:"2",
285641642350:"3",
35236675333:"4",
1041824321934:"5",
543608131854:"6",
826791682176:"7",
474888638830:"8",
483377383806:"9",
}

3、通过数字子模，可以确定图片中5x8的矩阵区域内是否是数字

通过img = ImageGrab.grab()函数可以截屏，然后直接从img中解析数字，相当实用，

当然img也可以通过Image.open("digit_5x8.png")打开文件的方式得到。

#!/usr/bin/python
# -*- coding: gb18030 -*-

import time
from PIL import Image

gCH_dic= { 
543338627454:"0",
1107230976:"1",
427442147681:"2",
285641642350:"3",
35236675333:"4",
1041824321934:"5",
543608131854:"6",
826791682176:"7",
474888638830:"8",
483377383806:"9",
}

def find_color_rect(imgdic, (x,y), chw, chh, color=(255,255,0)):
    ret_find = 1 << (chw*chh)
    ret_find = 0
    i = 0
    point_color = 0,0,0
    while i < chw:
        ret_find <<= chh
        j = 0
        while j < chh:
            point_color = imgdic[(x+i, y+j)]
            if point_color[0] == color[0] and point_color[1] == color[1] and point_color[2] == color[2]:
                ret_find |= 1 << (chh-1 - j)
            j += 1        
        i += 1
    return ret_find

def print_img_num(imgdic, (imgw, imgh), (chw, chh), color):
    str_data = ""
    x, y = 0, 0
    while y <= imgh - chh:
        x = 0
        sign = 0
        while x <= imgw - chw:        
            st = gCH_dic.get(find_color_rect(imgdic,(x,y),chw,chh,color), "-1")
            if st != "-1":
                str_data += st
                x += chw
                sign = 1
                continue
            x += 1
        if sign:
            y += chh
            str_data += "\n"
        else:
            y += 1
    print str_data

if __name__ == "__main__":

    time1 = time.time()
    img = Image.open("digit_1.png")
    imgdic = {}
    for j in range(img.size[1]):
        for i in range(img.size[0]):
            imgdic[(i,j)] = img.getpixel((i, j))
            
    print_img_num(imgdic, (img.size[0], img.size[1]), (5, 8), (0,0,0))
    print time.time() - time1

方便获取更多学习、工作、生活信息请关注本站微信公众号 城东书院微信服务号

来顶一下

返回首页

上一篇:全球最强 AI 告诉你，2025 年普通人如何搞钱？下一篇:自制汉字字库，识别汉字（二）（字母篇）

考了这些证书的可享个	怎样用 PS 把电子文档
记录在本地电脑部署自	围棋入门教程十天即