關于python源碼字符編碼的定義
運行如下Python打印語句:
print u'I "said" do not touch “this.""'
其中包含一個中文的雙引號,python解釋器報錯。報錯信息如下:
[wangy@bogon 文檔]$ python ex1.py
File "ex1.py", line 7
SyntaxError: Non-ASCII character '\xe2' in file ex1.py on line 7, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
查看鏈接 http://www.python.org/peps/pep-0263.html
主要內容如下:
在Python2.1版本中,源碼文件僅僅支持Latin-1,西歐國家的字符編碼,從而給亞洲的編程愛
好者造成很大的困擾,必須使用“unicode-escape”編碼來表示Unicode literals。
解決的方法就是為了讓解釋器了解源代碼的編碼,必須對源碼文件的編碼進行聲明。
定義編碼的方式:
Python will default to ASCII as standard encoding if no other encoding hints are given.
To define a source code encoding, a magic comment must be placed into the source
files either as first or second line in the file, such as:
# coding=
or (using formats recognized by popular editors):
#!/usr/bin/python
# -*- coding: -*-
or:
#!/usr/bin/python
# vim: set fileencoding= :
最好使用第一種或者第二種。
文中特別提到在windows平臺下,增加Unicode BOM標記在Unicode文件頭,因此不需要特別聲明文件編碼,同理也會在UTF-8文件頭增加UTF-8標記,故亦不需要聲明。
如果源文件使用 both the UTF-8 BOM mark signature and a magic encoding comment, the only allowed encoding for the comment is 'utf-8'. Any other encoding will cause an
error.
Examples
These are some examples to clarify the different styles for defining the source code encoding at the top of a Python source file:
With interpreter binary and using Emacs style file encoding comment:
#!/usr/bin/python
# -*- coding: latin-1 -*-
import os, sys
...
#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
import os, sys
...
#!/usr/bin/python
# -*- coding: ascii -*-
import os, sys
...
Without interpreter line, using plain text:
# This Python file uses the following encoding: utf-8
import os, sys
...
Text editors might have different ways of defining the file's encoding, e.g.:
#!/usr/local/bin/python
# coding: latin-1
import os, sys
...
Without encoding comment, Python's parser will assume ASCII text:
#!/usr/local/bin/python
import os, sys
...
Encoding comments which don't work:
Missing "coding:" prefix:
#!/usr/local/bin/python
# latin-1
import os, sys
...
Encoding comment not on line 1 or 2:
#!/usr/local/bin/python
#
# -*- coding: latin-1 -*-
import os, sys
...
Unsupported encoding:
#!/usr/local/bin/python
# -*- coding: utf-42 -*-
import os, sys
...
修改源代碼,以UTF-8保存,編輯器使用了Linux下的gedit
# -*- coding: utf-8 -*-
print "hello world!"
print "hello Again"
print "I like trying this"
print "This is fun"
print 'Yay! Printing'
print "I'd much rather you 'not'."
print u'I "said" 這里有中文雙引號 “this.""'
正常打印