Pythonist: 2010

Friday, October 8, 2010

json

json 库允许在 python 中将对象转换到 json 格式或者将 json 格式的文本转换成为 python 对象。

python 内置的几种对象已经有现成的转换方法了，如 dictionary 将被转换成为 JSONObject，list 会被转换成为 JSONArray，等等。我们所需要做的只是简单的调用 json.dump() 和 json.load() 即可。两者可以指定用于解析的 python 类（cls）。另外有 dumps 和 loads 命令用于转换成为字符串。

对于一些特殊的 python 对象，需要自己写 JSON 的 encoder 和 decoder，json 也提供了基本的类（json.JSONEncoder 与 json.JSONDecoder）供继承。一般需要重载 default 方法即可。下面是一个简单的例子：

#!/usr/bin/python

import json

class ComplexEncoder( json.JSONEncoder ):
    def default( self, obj ):
        if isinstance( obj, complex ):
            return {'real': obj.real, 'imag': obj.imag }
        return super.default( obj )

if __name__ == '__main__':
    print json.dumps( 1 + 2j, cls=ComplexEncoder )

运行结果如下：

$ ./test11.py 
{"real": 1.0, "imag": 2.0}

Wednesday, October 6, 2010

operator

在 python 中支持很多的 operator，除了 C/C++ 系以外，还有比如指数之类的，这里简单的列一下：

+-*/% 对应的是 __add__、__sub__、__mul__、__div__、__mod__，特殊的取相反数使用的是__neg__、而 + 对应的为 __pos__；
位运算对应的是 __and__、__or__、__inv__、__xor__、__lshift__、__rshift__
逻辑运算对应的是 __truth__、not_、is_、is_not
abs 对应于 __abs__；pow 对应于 __pow__；
// 运算是整数除法，对应于 __floordiv；
__contains__ 返回 a in b
__delitem__ 是 del a[b]；__getitem__ 是 a[b]；__getslice__ 是 a[b:c]；
对应的 = 版本，如 += 就是 __iadd__ 这种命名方法；

另外 operator 还有一些产生 functor 的函数，如

attrgetter 可以用来返回访问元素的 functor，如 a=attrgetter( 'a' )，这样 a(f) 返回的就是 f.a；
itemgetter 是通过 index 访问元素的 functor；
methodcaller 可以访问某个成员函数；

functools

functools 是为了方便的产生 functors 而设计的库，特别的 functoorls 提供了 decorator 以及 bind 这些常用的 functor。

functools 提供了下面几个工具：

cmp_to_key 新的排序等功能都是依照 key 来做的，而不是通过原来那种比较函数（输入两个对象，返回正负零表示大小关系）；这个可以将 cmp 类型的函数转换成为 key 类型的；
total_ordering 是一个 annotation，如果定义了某个类的 __eq__ 和几种比较运算的一种，就可以获得其他所有的类型了；
reduce 与 __builtins__.reduce 一样；
partial 用于产生 bind 类型的 functor，用法是 partial( func, *args, **dict)，例子如 a = partial(int, base=2) 之后 a('01010101') 就跟调用 int( '01010101', base=2) 一样了；
wraps 和 update_wrapper 用于产生 wrapper，下面是一个例子。

#!/usr/bin/python

import functools

def normal_func( a, b ):
    print a + ' says: this is a normal function ' + b

def decorator(f):
    @functools.wraps(f)
    def wrapper( *args, **dict ):
        print 'this is the descorator'
        return f( *args, **dict )
    return wrapper

if __name__ == '__main__':
    normal_func( 'foo', 'bar' )
    decorated_func = decorator( normal_func )
    decorated_func( 'tom', 'jerry' )

比较有意思的是这种做法其实可以以工厂方式生产 wrapper。结果如下：

$ ./test10.py 
foo says: this is a normal function bar
this is the descorator
tom says: this is a normal function jerry

itertools

itertools 提供了几个能够产生高效的 iterator 的函数。

所谓的 iterator 指实现了 next 方法的对象，这种对象可以使用 for dumb in iter 的方式进行遍历。习惯上对一个对象使用 iter() 方法等价于调用其 __iter__ 方法获得对应的 iterator，而后调用该 iterator 的 next 方法遍历。通常一个 iterator 实现的 __iter__ 方法返回自己。

下面看看 itertools 的 count 方法：

>>> import itertools
>>> c = itertools.count() 
>>> type(c)
<type 'itertools.count'>
>>> dir(c)
['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'next']
>>> for i in c:
...     print i
...     if i > 10:
...             break
... 
0
1
2
3
4
5
6
7
8
9
10
11

下面将介绍 itertools 里面其他的一些创建 iterator 的函数。一类和 count 类似，返回的是可以无限迭代下去的：

count 返回给定初始值和 step 向后计数的 iterator；
cycle 给出循环 iterator；
repeat 给出重复某个值的 iterator；

另一类可以看成是对一些 __builtins__ 方法的封装：

chain 给出遍历每个参数的每个 iterator 的 iterator
compress 有两个参数 a 和 b，返回 a[i] if b[i]
dropwhile 两个参数，前面一个条件（functor），后面是遍历的序列，返回去掉序列前面不符合条件的序列的 iterator；
groupby 返回的是对 key/data 遍历的 iterator，需要提供 keyfunction 这是前面通过 sort 排序使用的，然后 groupby 就会返回一个 tuple，第一个是 key，第二个是 key 相同的元素列表；
ifilter/ifilter_false 和 filter 类似，只是返回的是 iterator 而不是 list；
islice 返回类似 slice 操作的 iterator；
imap 返回类似 map 操作的 iterator；
startmap 和 imap 不同的是，imap 是每个参数一个 list，startmap 是每一组参数一个；
tee 可以将一个 iterator 变成 n 个独立的 iterator，这样某些问题里面可以分开到多线程里面处理； tee 过的 iterator 最好不要用在别处；
takewhile 和 dropwhile 类似，但是仅仅取条件成功的元素；
izip 和 zip 类似的 iterator；izip_longest 允许设置 fillvalue 用于补齐；
product 用于产生 Decartes 积；
permutations 用于产生置换；
combinations 产生组合；combination_with_replacement 带重复的组合；

需要说明的是 python 比较奇怪的产生 iterator 的方式是返回使用 yield 关键字，这种对象/函数一般称为 generator，利用 generator 可以很容易写出 iterator，而不打扰遍历的逻辑。比如文档中给出的 chain 函数的等价写法

def chain(*iterables): 
    # chain(’ABC’, ’DEF’) --> A B C D E F 
    for it in iterables:
        for element in it: 
            yield element

逻辑上我们就是这样遍历 chain 的所有参数，但是如果要写成一个 iterator 会比较麻烦，需要记录遍历到哪个参数了，它的 iteraotor 是谁，iteration 结束后跳到下一个参数；而 yield 的写法就很直观。

另外 iteration 结束 iterator 需要 raise StopIteration 的异常；如果使用 yield 则没有必要手工写这个异常。

Tuesday, October 5, 2010

argparse

其前身是 optparse，不过 argparse 是在 2.7 才出现的新模块，用的时候要小心。

使用 argparse 主要是使用 argparse.ArgumentParser 类，通过 add_argument 方法获得所支持的 options：

构造函数的参数有 description（命令的描述），epilog（参数帮助之后写的内容），prog（命令名），usage（使用方法，默认会自动生成），add_help（是否添加 -h/--help，默认为 True），argument_default（参数默认值），parents（额外的 parser），prefix_chars（选项前的字符组成的集合，默认为 '-'），conflict_handler（出现冲突后怎么处理，一般不需要），formatter_class；
print_help() 用于产生帮助信息；
parse_args() 可以调用建立好的 parser 处理一个 list，返回 argparse.Namespace 对象，可以用该对象的成员（名字和 argument 一样）获得对应的值；
add_argument() 方法可以添加两种参数，一种是选项（可有可无），一种是必须有的，两者差别在于对应的 add_argument 第一个参数（name or flag）是不是用 '-' 开始的字符串，如 add_argument( '-f', '--foo') 和 add_argument( 'bar' ) 就分别对应 -f 可选而 bar 是不可选，每个 argument 可以用 nargs 决定是否接受参数以及参数个数（> 0，如果仅仅需要有或者没有的区别，需要更改 action），nargs 可以取整数，也可以取 '?' 表示至多一个，如果没有给出将使用 default 给出的默认值；action 表示出现该 argument 后的行为，默认是 'store' 表示存储一个 list，默认的 nargs 是 1，可以用 store_const 使得存下来的是常数；store_true 和 store_false 表示仅存一个 boolean，无参数；'append' 允许将多个参数添加到一个 list 里面；对应有 append_const；另外有 version，这时需要 add_argument 的 version 参数，一般用作产生程序的版本信息；const 参数用于存放对应 store_const 或者 sppend_const 的默认值，默认为 None；type 允许进行类型转换；choices 限定该参数的取值范围；required 表示不可缺少，默认为 False；help 用于打印帮助信息；metavar 用于在帮助信息中现实选项的参数；dest 用来表示产生 argparse.Namespace 对象时存放该选项的成员名字；
add_subparser 允许添加多个 parser（类似的有 svn 命令，hadoop 等）；
add_argument_group 允许用户将 argument 分组；
set_defaults 允许直接设定默认值；

下面是一个简单的使用 argparse 的例子，

#!/opt/local/bin/python2.7

import argparse
import sys

if __name__ == '__main__':
    parser = argparse.ArgumentParser( description='test program for argparse' )
    parser.add_argument( '-n', '--negate', help='negate the result', \
                         action='store_false' )
    parser.add_argument( 'integer', help='integers to operate with', \
                         action='store', metavar='N', nargs=2, type=int)
    parser.add_argument( '-o', '--operation', help = 'operations', \
                         choices='+-*/', dest='op', nargs='?', default='+' )
    parser.add_argument( '-v', '--version', help='version information', \
                         action='version', version=sys.argv[0] + ' 0.1' )
    res = parser.parse_args( sys.argv[1:] )

    if res.op == '+':
        x = res.integer[0] + res.integer[1]
    elif  res.op == '-':
        x = res.integer[0] - res.integer[1]
    elif  res.op == '*':
        x = res.integer[0] * res.integer[1]
    elif  res.op == '/':
        x = res.integer[0] / res.integer[1]
    else:
        print 'you shouldn\'t reach here!'
        sys.exit(1)
    print x

    sys.exit(0)

我们可以在命令行上面玩玩这个简单的四则运算

$ ./test09.py
usage: test09.py [-h] [-n] [-o [{+,-,*,/}]] [-v] N N
test09.py: error: too few arguments

$ ./test09.py -h
usage: test09.py [-h] [-n] [-o [{+,-,*,/}]] [-v] N N

test program for argparse

positional arguments:
  N                     integers to operate with

optional arguments:
  -h, --help            show this help message and exit
  -n, --negate          negate the result
  -o [{+,-,*,/}], --operation [{+,-,*,/}]
                        operations
  -v, --version         version information

$ ./test09.py -v
./test09.py 0.1

$ ./test09.py 14 87
101

$ ./test09.py 14 87 -o -
-73

gc

这是 python 的 garbage collection 的接口，Java 其实也有，放在 java.lang.Runtime.gc()。

gc 模块主要提供了以下方法:

enable() 和 disable() 用于打开和关闭自动 gc 功能；
collect() 用于主动调用 gc；
get_count 和 get_shreshold 获得 gc 的基本信息；
get_referrers 与 get_referents 获得对象的引用和被引用；

atexit

非常简单的用法：

import atexit
atexit.register( func_name, param )

@atexit.register
def func_name():
    pass

两种用法，前面的支持参数，后面的不支持参数。搞不懂为啥学 Java 搞这个 annotation，无意义啊！

python 的几个数值计算库

原文见这里。

我们后面将主要学习

numpy 为 python 提供了类似于 matlab 的数值计算环境，一些基本的矩阵操作，常用函数等等，一般我们 from numpy import * 后就可以跟 matlab 类似的用 python 了，当然两者的语法还是有很大的不同的；
scipy 作为 numpy 的补充，提供的是更高级一些的操作，如优化、积分等等；
sympy 提供的是符号计算系统；
cvxmod 是 Stephen Boyd 的学生提供的凸优化建模系统（只需要写模型就行）；
pyimsl 是最大的统计、分析算法库集合 IMSL 的接口；
pygsl 是 GSL 的接口；
pil 是处理图形的 python 库；
matplotlib 和 matlab 类似的画图库；
VTK 是 3D graphics 库；
py-gnuplot 是 gnuplot 的接口。

Monday, October 4, 2010

threading

threading 是 python 中实现多线程编程的“高级”库，而 thread 是相对低级的；如果对应的 OS 不支持多线程，可以用 dummy_threading 代替。

在 threading 中，主要有下面几个 class：

threading.Thread 对象是主要的创建线程的工具，一般我们实现其 Thread.run 方法即可，然后调用 Thread.start() 开始在新的线程里面执行 Thread.run 的内容，其 join 方法用于等候其他线程结束。我们可以设置其 name 和 daemon 属性；
threading.Lock 对象用于设置锁，通过 Lock.acquire() 方法与 Lock.release() 方法进行对某些需要串行访问对象的保护；
threading.RLock 是 reentrant lock，允许同一个线程多次上锁；
threading.Condition 用于创建条件同步，它需要一个 Lock 或者 RLock 才能构造，条件满足后，可以通过 notify() 或者 notifyAll() 让所有等候的线程从挂起变成继续执行状态；
threading.Semaphore 用于创建所谓的 semaphore 对象，一旦该对象的值为 0 就会将需要获得的线程挂起，这往往用作设定最大连接数后，启动线程去完成某些任务；
threading.Event 对象用于创建所谓的事件，通过 wait 方法等到某个线程调用 set 时就会被结束等待；
threading.Timer 与 timer 模块不同，这是用 thread 实现的，而不是用系统的那个；可以用来在某个时间执行某个函数；

同时，threading 还提供了一些函数，如 activeCount() 获得活动的线程数，currentThread() 返回当前 thread 对象，local() 用于创建每个线程私有的变量；等等。

下面是一个简单的使用多线程的 hello world。

#!/usr/bin/env python

import threading
import sys

class HelloThread( threading.Thread ):
    def hello( self, t ):
        print 'hello from %(id)s for the %(t)dth time' \
            % { 'id': self.ident, 't': t }
    def run( self ):
        for i in range(10):
            self.hello( i )

if __name__ == '__main__':
    t = list()
    for i in range(5):
        t.append( HelloThread() )
        t[i].start()
    for i in range(5):
        t[i].join()
    sys.exit(0)

这里启动了 5 个线程，每个都数数，结果大致为

hello from 4302057472 for the 0th time
hello from 4302057472 for the 1th time
hello from 4302057472 for the 2th time
hello from 4302057472 for the 3th time
 hello from 4302594048 for the 0th time
hello from 4302594048 for the 1th time
hello from 4302594048 for the 2th time
hello from 4302594048 for the 3th time
hello from 4302594048 for the 4th time
hello from 4302594048 for the 5th time
hello from 4302594048 for the 6th time
hello from 4302594048 for the 7th time
hello from 4302594048 for the 8th time
hello from 4302594048 for the 9th time
hello from 4302594048 for the 0th time
hello from 4302594048 for the 1th time
hello from 4302594048 for the 2th time
hello from 4302594048 for the 3th time
hello from 4302594048 for the 4th time
hello from 4302594048 for the 5th time
hello from 4302594048 for the 6th time
hello from 4302594048 for the 7th time
hello from 4302594048 for the 8th time
hello from 4302594048 for the 9th time
hello from 4302057472 for the 4th time
 hello from 4302057472 for the 5th time
hello from 4302057472 for the 6th time
hello from 4302057472 for the 7th time
hello from 4302057472 for the 8th time
hello from 4302057472 for the 9th time
hello from 4302594048 for the 0th time
hello from 4302594048 for the 1th time
hello from 4302594048 for the 2th time
hello from 4302594048 for the 3th time
hello from 4302594048 for the 4th time
hello from 4302594048 for the 5th time
hello from 4302594048 for the 6th time
hello from 4302594048 for the 7th time
hello from 4302594048 for the 8th time
hello from 4302594048 for the 9th time
hello from 4303130624 for the 0th time
hello from 4303130624 for the 1th time
hello from 4303130624 for the 2th time
hello from 4303130624 for the 3th time
hello from 4303130624 for the 4th time
hello from 4303130624 for the 5th time
hello from 4303130624 for the 6th time
hello from 4303130624 for the 7th time
hello from 4303130624 for the 8th time
hello from 4303130624 for the 9th time

某些输出被后面的覆盖了...

Sunday, October 3, 2010

cmd

我们先看看下面简单的例子

#!/usr/bin/env python

import readline
import cmd
import sys

histfile='test06.hist'

class testcmd( cmd.Cmd ):
    def emptyline( self ):
        print 'empty line entered'
    def do_ls( self, str ):
        '''list the content of a directory'''
        print 'ls ' + str
    def preloop( self ):
        print 'welcome to our testing shell'

if __name__ == '__main__':
    try:
        readline.parse_and_bind( 'Tab: complete' )
        readline.read_history_file( histfile )
    except IOError:
        pass
    import atexit
    atexit.register( readline.write_history_file, histfile )

    mycmd = testcmd( 'Tab', sys.stdin, sys.stdout )
    mycmd.prompt = '> '
    try:
        mycmd.cmdloop()
    except KeyboardInterrupt:
        print 'exiting...'
        sys.exit(0)

当然这是一个非常简单的例子，大致展示了如何使用 cmd 做一个基本的命令行接口。这里使用了 readline 模块，用于记录命令历史，另外使用了 atexit 保证在退出的时候写入历史文件。使用 cmd 的基本方法可以看出就是继承 cmd.Cmd 类，重载其方法，对起某些成员进行赋值，这里我们重载了 Cmd.emptyline() 方法，使得用户没有输入的时候显示一行废话。

下面我们将仔细看看 cmd.Cmd 其他有用的方法。

cmd.Cmd 最重要的就是 Cmd.cmdloop()，这个循环开始会调用 Cmd.preloop()，结束会调用 Cmd.postloop()；
进入循环后，首先会显示 Cmd.prompt，等候用户输入，如果用户输入的命令是我们已经定义（见下面）的，则会执行 do_* 函数；如果是以 ! 开头则会执行 do_shell() 命令；如果是 ? 开头则执行 do_help() 命令；如果没有找到该命令，则会执行 default()，如果命令为空则执行 emptyline()。
在执行命令前一般会调用 precmd()，执行后会调用 postcmd()，没有特别需要一般不会调用/重载这两个命令。命令的解析一般由 onecmd() 解析，会找到 do_* 这种方法并调用。
一个特定的命令可以写 do_* 实现，可以写 help_* 作为帮助，也可以直接用前者的 docstring 作为帮助；
为了使用补全功能，可以实现 completedefault() 方法，或者为每个命令创建一个 complete_* 方法，返回供补全的 list；

sys

这是最常用的运行时 module，这里列出一些常用的功能：

sys.argv 是命令行参数 list；
byteorder 是 little/big endian；
sys.executable 是 python 的二进制文件；
sys.exit 是退出的标准方法；
sys.float_info、sys.maxint、sys.longinfo 是一些相关的数值信息；
sys.getrefcount 获得引用数；
sys.getsizeof 获得对象的大小，调用对象的 __sizeof__ 方法；
sys.platform 是操作系统的版本；
sys.ps1 和 sys.ps2 和环境变量的意思一样；
sys.stdin、sys.stdout 和 sys.stderr 是标准的几个 stream；

Saturday, October 2, 2010

unittest

这是一个完整的 unit test 工具。doctest 的优势是可以在不改变原始 module 文件的同时加入 unit test 的信息，但是对于一些占用了 __name__ == '__main__' 的 python 程序而言，直接使用 unittest 是更加合适的选择。unittest 可以作为单独的测试文件存在，我们可以将 testcase 通过类（继承 unittest.TestCase）封装成为不同的部分，然后对某些功能写成该类的 test_* 方法，最后使用 assert* 方法进行判定，另外，我们还可以重载 setUp 方法，用于初始化这个 testcase。我们可以在 __name__ == '__main__' 时调用 unittest.main() 执行所有的 test case。另外文档中还介绍了更加细致的用法。下面是一个简单的 unittest 例子：

#!/usr/bin/env python

import unittest

class ListMethodTest( unittest.TestCase ):
    def setUp( self ):
        self.a = range( 10 )
        self.b = [ 1, 1, 1 ]
        self.c = [ 0, 0, 0 ]
        self.d = [ 1, 0, 1 ]

    def test_All( self ):
        self.assertEqual( all(self.a), False )
        self.assertEqual( all(self.b), True )
        self.assertEqual( all(self.c), False )

    def test_Any( self ):
        self.assertTrue( any(self.a) )
        self.assertTrue( any(self.b) )
        self.assertTrue( any(self.d) )

if __name__ == '__main__':
    unittest.main()

运行结果如下：

$ ./test05.py 
..
----------------------------------------------------------------------
Ran 2 tests in 0.000s

OK
$ ./test05.py -v
test_All (__main__.ListMethodTest) ... ok
test_Any (__main__.ListMethodTest) ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.000s

OK

我们可以看看 TestCase 还有一些什么方法：

tearDown 提供了和 setUp 相反的作用，用于清理 TestCase 失败后的问题；
@unittest.skip 可以标注一些不使用的 test case；类似的有 unittest.skipIf 和 unittest.skipUnless。
如果确定应该出现错误可以用 @unittest.expectedFailure；
与 setUp 和 tearDown 对应的还有 setUpClass 和 tearDownClass；
类似于 assertEqual 之类的还可以 assert 大小 In、re 等等，或者 assertRaise 表示应该抛出何种异常。
可以用类似的 fail* 系列表示测试失败；
可以用 TestSuite 将一堆 TestCase 放在一起；
可以用 unittest 的 discover 模块进行某个目录下所有 test case 的自动执行；

Friday, October 1, 2010

doctest

自动的测试 doc 里面的例子。这是 ref 里面的例子：

#!/usr/bin/env python

'''
>>> factorial(5)
120
'''

def factorial( n ):
    '''return the factorial of n
    >>> factorial( 3 )
    6
    '''
    if n <= 1: return 1
    else:
        return n * factorial(n - 1)

if __name__ == '__main__':
    import doctest
    doctest.testmod()

之后可以

$ ./test04.py -v
Trying:
    factorial(5)
Expecting:
    120
ok
Trying:
    factorial( 3 )
Expecting:
    6
ok
2 items passed all tests:
   1 tests in __main__
   1 tests in __main__.factorial
2 tests in 2 items.
2 passed and 0 failed.
Test passed.

非常方便的 unit test 工具。另外可以写一个无关的文本文件，如 README 之类的介绍如何使用这个 module，里面的例子也可以作为测试，这只需要 doctest.testfile() 指定文件即可。测试还可以包括出现错误的情况，doctest 也能发现抛出的 exception 是否匹配。

doctest 是基于 unittest 做的，本质上就是从文件或者文档中抽取例子。后面我们将具体看看 unittest 提供给我们一些什么样的功能。

builtins

这是 python 内置的函数。我们这里简单的分为下面几类：

创建某些基本类型，或类型转换，如 int、float、complex、str、dict、list、tuple、frozenset、long、bool、basestring（是 str 和 unicode 的父类）、bin（将整数转换成二进制字符串）、hex、oct、object（据说没啥用）、set
方便处理 list 的一些函数如 all、any、enumerate、filter、map、max、min、range（xrange）、reduce、slice、sorted、reversed、zip、apply、buffer。
模块 import（__import__）、reload
运行时与类相关函数 type、super、isinstance、issubclass、callable（看看有没有 __call__() 方法）、classmethod（类作为第一个参数，而不是对象自己）、staticmethod（和 static 方法类似）、cmp（调用 __cmp__() 方法）、setattrib、getattrib、hasattrib、delattrib、dir、help、locals 和 globals、hash（调用 __hash__() 可以覆盖原始的 hash 函数）、id、iter（调用 __iter__() 方法）、memoryview、property、
字符串以及编码 unicode、chr、unichr、format、print、intern、ord；
数学函数 abs、divmod、pow、round、coerce；
functional language 特性，如 eval、execfile、repr（调用 __repr__，返回能在 eval 使用的表示）
compile
文件操作 open、file；
接受用户输入 input、raw_input；

一个工具是 pydoc，可以看到这些函数的说明，如 pydoc sys。

一些有用的 python 库

在 python 的 ref 里面有不少有用的 python 库介绍，这里列几个：

string 字符串操作，包括判断类型，格式化，字符串模版。
re 正则表达式，用于创建 re 对象，进行正则表达式匹配、替换、拆分等行为。
struct 用于将 python 中某些对象转换成为 C 中 struct，但以 python 中的 string 表达这个二进制数据，常用于串行化数据。
difflib 实现了文件的比较，包括某些带有格式信息的（如 HTML）字符串、文件的比较方式。
textwrap 提供字符串输出的时候的换行功能。
codecs 是 python 的编码解码库，提供了常见编码方式（如 utf8 等）编码。
fpformat 提供了固定点（fixing point 相对于 floating point）的格式化函数。
datetime 提供了基本的日期、时间操作。calendar 提供了相关的日历操作。
collections 提供了除 python 默认的容器以外的一些高性能容器类，如 namedtuple、deque、Counter、OrderedDict；heapq 实现了利用堆实现的优先队列；
bisect 实现的是有序表的二分查找；
array 实现的是类似数组的功能，能够更高效的用于数值计算。
sched 基本的调度函数，可以在指定时间执行某些函数。但是它不是线程安全的。
queue 实现了一些基本的队列，如先进后出，优先队列等。
weakref 允许创建“弱引用”，一般说来 python 的 GC 不会清理带有引用的对象，但是如果使用 weakref 则允许存在某些对象被 GC 清理掉，这样创建的 weakref 将返回 None 共程序员决定是否重新加载该对象。
UserDict、UserList 和 UserString 提供了用户自己实现字典、列表和字符串的基本接口。
types 提供了运行时类型判定。
copy 提供了深度和浅层对象的复制。
pprint 提供了更好看的 print。
repr 提供了另一种 repr。
numbers 提供了常用数的类，如有理数、复数等。
math 和 cmath 提供了一些常用的数学函数，后者是复数使用的。decimal 是固定精度计算和浮点数计算的函数。
fraction 提供了有理数计算。
random 提供了随机数生成。
itertool 提供了一些生成 iterator 的工具。
functool 产生 functor 的工具。
operator 一些常用的运算符。
os.path 访问文件、目录的常用函数
fileinput 用于按照行读入文件内容；
stat 实现的是 libc 里面的 stat() 函数功能；
tempfile 产生临时文件和临时目录；
glob 产生文件名展开的结果；
fnmatch 产生文件名匹配；
linecache 允许随机访问文件的每行，但使用了 cache，这样访问效率较高；
shutil 是 shell 的常用工具，包括较高层的文件操作，如复制，重命名等；
pickle 可以用来串行化 python 对象；cPickle 实现的更快；copy(_)reg 用于注册 pickle；
shelve 是一个 persistent 且和字典类似的对象（通过 pickle），可以用于串行化 python 对象并使之 persistent。类似的接口存在于访问数据库的一些模块里面。
marshal 用于直接串行化 python 对象，它处理的不是 pickle 产生的串行化对象，而是依赖于当前 python 版本的二进制数据；
sqlite3 访问 sqlite3 数据库，嗯，这作为我第一个使用的 SQL 数据库玩玩吧；
几个用于压缩的 zlib、gzip 和 bz2；
几个特殊的文件格式 csv、ConfigParser（跟 ini 文件格式类似）和 plistlib（就是 mac 下面的那个 plist 文件）；
几个 hash 和数字签名的模块，如 hashlib、hmac、sha 和 md5；
常用 os 相关命令，如目录操作等；输入输出使用 io 模块；
time 和 time.h 里面提供的函数类似；
argparse 用于解析命令行参数；
logging 用于产生日志；
getpass 用于获得登陆信息；
curses 用于 terminal 字符显示（和 ncurses 类似？）
platform 用于获得主机信息；
errno 和 ctypes 对应 C 的对应功能；
select 对应与 C 中 select() 函数；
threading 提供了比较高级的多线程工具，而 thread 相对较低级；
multiprocessing 多进程编程；
readline 将 readline 库的基本功能打包了；rlcompleter 用于 readline 的补全功能；
子进程管理 subprocess，包括使用管道等；socket 包含了 socket 通信；signal 设置进程信号处理 handler；
email 包含了处理邮件的常用功能；json 允许我们处理 json 格式的数据；HTMLParser 允许我们处理 html 文件；
cgi 允许我们写 CGI 程序；httplib 允许我们使用 HTTP 客户端的基本功能；类似的还支持 ftp 等协议；
SimpleHTPServer 是一个简单的 http server 实现；
音频视频处理 audioop 和 imageop；
颜色转换 colorsys；
多国文字支持 gettext 和 locale，与 C 程序类似；
cmd 支持建立命令行处理程序；
pydoc 是 python 的文档系统，地位和 perldoc 类似；
unittest 用于单元测试；test 用于 regression test；

嗯，我们下面将仔细学习其中某些有意思的库。

re
weakref
itertool
functool
operator
pickle
shelve
sqlite3
os
io
argparse
logging
threading
subprocess
json
pydoc
unittest
test