厚街网站建设费用,影楼网站怎么做,做自媒体与做网站,wordpress建站优缺点用 Python 编写干净、可测试、高质量的代码Noah Gift
2010 年 12 月 20 日发布简介
编写软件是人所承担的最复杂的任务之一。AWK 编程语言和 K and R C 的作者之一 Brian Kernigan 在 Software Tools 一书中总结了软件开发的真实性质#xff0c;他说#xff0c;“…用 Python 编写干净、可测试、高质量的代码Noah Gift
2010 年 12 月 20 日发布简介
编写软件是人所承担的最复杂的任务之一。AWK 编程语言和 K and R C 的作者之一 Brian Kernigan 在 Software Tools 一书中总结了软件开发的真实性质他说“控制复杂性是软件开发的根本。” 真实软件开发的残酷现实是软件常常具有有意或无意造成的复杂性而且开发人员常常漠视可维护性、可测试性和质量。这种不幸局面的最终结果是软件的维护变得越来越困难且昂贵软件偶尔会出故障甚至是重大故障。
编写高质量代码的第一步是重新考量个人或团队开发软件的整个过程。在失败或陷入麻烦的软件开发项目中常常按违反原则的方式开发软件开发人员关注的重点是解决问题无论采用什么方式。在成功的软件项目中开发人员不但要考虑如何解决手中的问题还要考虑解决问题涉及到的过程。
成功的软件开发人员会按照便于自动化的方式运行测试这样就可以不断地证明软件工作正常。他们明白不必要的复杂性的危害。他们严格地遵守自己的方法在每个阶段都进行认真的复查寻找重构的机会。他们经常思考如何确保其软件是可测试、可读且可维护的。尽管 Python 语言的设计者和 Python 社区都非常重视编写干净、可维护的代码但是仍然很容易出现相反的局面。在本文中我们要探讨这个问题讨论如何用 Python 编写干净、可测试、高质量的代码。
干净代码假想问题
演示这种开发风格的最好方法是解决一个假想的问题。假设您是某公司的后端 web 开发人员公司允许用户发表评论您需要设法显示和突出显示这些评论的小片段。解决此问题的一种方法是编写一个大函数它接受文本片段和查询参数返回字符数量有限的片段并突出显示查询参数。解决此问题所需的所有逻辑都放在一个巨大的函数中您只需反复运行脚本直到得到想要的结果。代码结构很可能像下面的代码示例这样常常包含打印语句或日志记录语句和交互式 shell。
杂乱的代码def my_mega_function(snippet, query)
This takes a snippet of text, and a query parameter and returns
#Logic goes here, and often runs on for several hundred lines
#There are often deeply nested conditional statements and loops
#Function could reach several hundred, if not thousands of lines
return result
对于 Python、Perl 或 Ruby 等动态语言软件开发人员很容易一味专注于问题本身常常采用交互方式进行探索直到出现看似正确的结果然后就宣告任务完成了。不幸的是尽管这种方式很方便、很有吸引力但是这常常会造成大功告成的错觉这是很危险的。危险主要在于没有设计可测试的解决方案而且没有对软件的复杂性进行适当的控制。
您如何确认这个函数工作正常呢在开发期间最后一次运行它时它是正常的您就此相信它是有效的但是您能确定它的逻辑或语法中没有细微的错误吗如果需要修改代码会怎么样它仍然有效吗您如何确认它仍然有效如果需要由另一位开发人员维护并修改代码会怎么样他如何确认他的修改不会造成问题对于他来说理解代码的作用有多难
简单地说如果没有测试就不知道软件是否有效。如果在开发过程中总是假设而不是证明有效性最终可能会开发出看似有效的代码但是没人能够肯定代码会正确地运行。这种局面太糟糕了我编写过这样的软件也曾经帮助调试以这种方式编写的软件。幸运的是很容易避免这种局面。应该先编写测试比如测试驱动的开发否则在编写逻辑的过程中编写代码的方向会偏离目标。先编写测试会产生模块化的可扩展的代码这种代码很容易测试、理解和维护。对于有经验的开发人员来说很容易看出软件是否是在一直牢记着测试的情况下编写的。软件本身在高手看来差别非常大。
您不必听信我的观点也不必直接研究代码可以通过其他方法明显地看出这两种风格之间的差异。第一种方法是实际度量得到测试的代码行数。Nose 是一种流行的 Python 单元测试框架扩展它可以方便地自动运行一批测试和插件比如度量代码覆盖率。通过在开发期间度量代码覆盖率会很快看出对于由大函数组成、包含深度嵌套的逻辑、以非一般化方式构建的代码来说测试覆盖率几乎不可能达到 100%。
度量差异的第二种方法是使用静态分析工具。有几种流行的 Python 工具可以为 Python 开发人员提供多种指标从一般性代码质量指标到重复代码或复杂度等特殊指标。可以用 pygenie 或 pymetrics 度量代码的圈cyclomatic复杂度见 参考资料。
下面是对相当简单的 “干净” 代码运行 pygenie 的结果示例
pygenie 的圈复杂度输出% python pygenie.py complexity --verbose highlight spy
File: /Users/ngift/Documents/src/highlight.py
Type Name Complexity
----------------------------------------------------------------------------------------
M HighlightDocumentOperations._create_snippit 3
M HighlightDocumentOperations._reconstruct_document_string 3
M HighlightDocumentOperations._doc_to_sentences 2
M HighlightDocumentOperations._querystring_to_dict 2
M HighlightDocumentOperations._word_frequency_sort 2
M HighlightDocumentOperations.highlight_doc 2
X /Users/ngift/Documents/src/highlight.py 1
C HighlightDocumentOperations 1
M HighlightDocumentOperations.__init__ 1
M HighlightDocumentOperations._custom_highlight_tag 1
M HighlightDocumentOperations._score_sentences 1
M HighlightDocumentOperations._multiple_string_replace 1
什么是圈复杂度
圈复杂度是 Thomas J. McCabe 在 1976 年开创的软件指标用来判断程序的复杂度。这个指标度量源代码中线性独立的路径或分支的数量。根据 McCabe 所说一个方法的复杂度最好保持在 10 以下。这是因为对人类记忆力的研究表明人的短期记忆只能存储 7 件事偏差为正负 2。
如果开发人员编写的代码有 50 个线性独立的路径那么为了在头脑中描绘出方法中发生的情况需要的记忆力大约超过短期记忆容量的 5 倍。简单的方法不会超过人的短期记忆力的极限因此更容易应付事实证明它们的错误更少。Enerjy 在 2008 年所做的研究表明在圈复杂度与错误数量之间有很强的相关性。复杂度为 11 的类的出错概率为 0.28而复杂度为 74 的类的出错概率会上升到 0.98。
正如在此示例中看到的每个方法都极其简单复杂度都低于 10这符合 McCabe 提出的原则。在我的从业经历中我见过在没有测试的情况下编写的巨大函数它们的复杂度超过 140长度超过 1200 行。毫无疑问根本不可能测试这样的代码。实际上甚至无法确认它是有效的也不可能重构它。如果代码的作者一直牢记测试在保持 100% 测试覆盖率的情况下编写相同的逻辑就不可能出现如此高的复杂度。
干净代码假想解决方案
现在我们来看一个完整的源代码示例以及相配的单元测试和功能性测试看看它的实际作用以及为什么说这样的代码是干净的。按照严格的指标“干净” 的合理定义是代码满足以下要求接近 100% 测试覆盖率所有类和方法的圈复杂度都低于 10用 pylint 得到的评分接近 10.0。下面的示例使用 nose 在 highlight 模块上执行单元测试和 doctest 覆盖率检查
运行 nosetests 和覆盖率报告100% 覆盖率% nosetests -v --with-coverage --cover-packagehighlight --with-doctest\
--cover-erase --exe
Doctest: highlight.HighlightDocumentOperations._custom_highlight_tag ... ok
test_functional.test_snippit_algorithm ... ok
test_custom_highlight_tag (test_highlight.TestHighlight) ... ok
Consumes the generator, and then verifies the result[0] ... ok
Verifies highlighted text is what we expect ... ok
test_multi_string_replace (test_highlight.TestHighlight) ... ok
Verifies the yielded results are what is expected ... ok
Name Stmts Exec Cover Missing
-----------------------------------------
highlight 71 71 100%
----------------------------------------------------------------------
Ran 7 tests in 4.223s
OK
如上所示带几个选项运行了 nosetests 命令highlight spy 脚本的测试覆盖率为 100%。惟一需要注意的是 --cover-packagehighlight它让 nose 只显示指定的模块的覆盖率报告。这可以非常有效地把覆盖率报告的输出限制为您希望观察的模块或包。可以从本文下载源代码注释掉一些测试从而观察覆盖率报告机制的实际工作情况。
highlight spy#/usr/bin/python
# -*- coding: utf-8 -*-:mod:highlight -- Highlight Methods.. module:: highlight
:platform: Unix, Windows
:synopsis: highlight document snippets that match a query.
.. moduleauthor:: Noah Gift
Requirements::
1. You will need to install the ntlk library to run this code.
http://www.nltk.org/download
2. You will need to download the data for the ntlk:
See http://www.nltk.org/data::
import nltk
nltk.download()import re
import logging
import nltk
#Globals
logging.basicConfig()
LOG logging.getLogger(highlight)
LOG.setLevel(logging.INFO)
class HighlightDocumentOperations(object):
Highlight Operations for a Document
def __init__(self, documentNone, queryNone):Kwargs:
document (str):
query (str):self._document document
self._query query
staticmethod
def _custom_highlight_tag(phrase,
start,
end):
Injects an open and close highlight tag after a word
Args:
phrase (str) - A word or phrase.
Kwargs:
start (str) - An opening tag. Defaults to
end (str) - A closing tag. Defaults to
Returns:
(str) word or phrase with custom opening and closing tagsh HighlightDocumentOperations()h._custom_highlight_tag(foo)
footagged_phrase {0}{1}{2}.format(start, phrase, end)
return tagged_phrase
def _doc_to_sentences(self):
Takes a string document and converts it into a list of sentences
Unfortunately, this approach might be a tad naive for production
because some segments that are split on a period are really an
abbreviation, and to make things even more complicated, an
abbreviation can also be the end of a sentence::
http://nltk.googlecode.com/svn/trunk/doc/book/ch03.html
Returns:
(generator) A generator object of a tokenized sentence tuple,
with the list position of sentence as the first portion of
the tuple, such as: (0, This was the first sentence)tokenizer nltk.data.load(tokenizers/punkt/english.pickle)
sentences tokenizer.tokenize(self._document)
for sentence in enumerate(sentences):
yield sentence
staticmethod
def _score_sentences(sentence, querydict):
Creates a scoring system for each sentence by substitution analysis
Tokenizes each sentence, counts characters
in sentence, and pass it back as nested tuple
Returns:
(tuple) - (score (int), (count (int), position (int),
raw sentence (str))position, sentence sentence
count len(sentence)
regex re.compile(|.join(map(re.escape, querydict)))
score len(re.findall(regex, sentence))
processed_score (score, (count, position, sentence))
return processed_score
def _querystring_to_dict(self, split_token):
Converts query parameters into a dictionary
Returns:
(dict)- dparams, a dictionary of query parametersparams self._query.split(split_token)
dparams dict([(key, self._custom_highlight_tag(key)) for\
key in params])
return dparams
staticmethod
def _word_frequency_sort(sentences):
Sorts sentences by score frequency, yields sorted result
This will yield the highest score count items first.
Args:
sentences (list) - a nested tuple inside of list
[(0, (90, 3, The crust/dough was just way too effin dry for me.
Yes, I know what cornmeal is, thanks.))]sentences.sort()
while sentences:
yield sentences.pop()
def _create_snippit(self, sentences, max_characters175):
Creates a snippet from a sentence while keeping it under max_chars
Returns a sorted list with max characters. The sort is an attempt
to rebuild the original document structure as close as possible,
with the new sorting by scoring and the limitation of max_chars.
Args:
sentences (generator) - sorted object to turn into a snippit
max_characters (int) - optional max characters of snippit
Returns:
snippit (list) - returns a sorted list with a nested tuple that
has the first index holding the original position of the list::
[(0, (90, 3, The crust/dough was just way too effin dry for me.
Yes, I know what cornmeal is, thanks.))]snippit []
total 0
for sentence in self._word_frequency_sort(sentences):
LOG.debug(Creating snippit, sentence)
score, (count, position, raw_sentence) sentence
total count
if total max_characters:
#position now gets converted to index 0 for sorting later
snippit.append(((position), score, count, raw_sentence))
#try to reassemble document by original order by doing a simple sort
snippit.sort()
return snippit
staticmethod
def _multiple_string_replace(string_to_replace, dict_patterns):
Performs a multiple replace in a string with dict pattern.
Borrowed from Python Cookbook.
Args:
string_to_replace (str) - String to be multi-replaced
dict_patterns (dict) - A dict full of patterns
Returns:
(str) - Multiple replaced string.regex re.compile(|.join(map(re.escape, dict_patterns)))
def one_xlat(match):
Closure that is called repeatedly during multi-substitution.
Args:
match (SRE_Match object)
Returns:
partial string substitution (str)return dict_patterns[match.group(0)]
return regex.sub(one_xlat, string_to_replace)
def _reconstruct_document_string(self, snippit, querydict):
Reconstructs string snippit, build tags, and return string
A helper function for highlight_doc.
Args:
string_to_replace (list) - A list of nested tuples, containing
this pattern::
[(0, (90, 3, The crust/dough was just way too effin dry for me.
Yes, I know what cornmeal is, thanks.))]
dict_patterns (dict) - A dict full of patterns
Returns:
(str) The most relevant snippet with the query terms highlighted.snip []
for entry in snippit:
score entry[1]
sent entry[3]
#if we have matches, now do the multi-replace
if score:
sent self._multiple_string_replace(sent,
querydict)
snip.append(sent)
highlighted_snip .join(snip)
return highlighted_snip
def highlight_doc(self):
Finds the most relevant snippit with the query terms highlighted
Returns:
(str) The most relevant snippet with the query terms highlighted.#tokenize to sentences, and convert query to a dict
sentences self._doc_to_sentences()
querydict self._querystring_to_dict()
#process and score sentences
scored_sentences []
for sentence in sentences:
scored self._score_sentences(sentence, querydict)
scored_sentences.append(scored)
#fit into max characters, and sort by original position
snippit self._create_snippit(scored_sentences)
#assemble back into string
highlighted_snip self._reconstruct_document_string(snippit,
querydict)
return highlighted_snip
test_highlight.py#/usr/bin/python
# -*- coding: utf-8 -*-Tests this query searches a document, highlights a snippit and returns it
http://www.example.com/search?find_descdeepdishpizzans1rpp10find_loc\
SanFrancisco%2CCA
Contains both unit and functional tests.import unittest
from highlight import HighlightDocumentOperations
class TestHighlight(unittest.TestCase):
def setUp(self):
self.document
Review for their take-out only.
Tried their large Classic (sausage, mushroom, peppers and onions) deep dish;\
and their large Pesto Chicken thin crust pizzas.
Pizza Ive had better. The crust/dough was just way too effin dry for me.\
Yes, I know what cornmeal is, thanks. But its way too dry.\
Im not talking about the bottom of the pizza...Im talking about the dough \
thats in between the sauce and bottom of the pie...it was like cardboard, sorry!
Wings spicy and good. Bleu cheese dressing only...hmmm, but no alternative\
of ranch dressing, at all. Service friendly enough at the counters.
Decor freakin dark. Im not sure how people can see their food.
Parking a real pain. Good luck.self.query deepdishpizza
self.hdo HighlightDocumentOperations(self.document, self.query)
def test_custom_highlight_tag(self):
actual self.hdo._custom_highlight_tag(foo,
start[BAR],
end[ENDBAR])
expected [BAR]foo[ENDBAR]
self.assertEqual(actual,expected)
def test_query_string_to_dict(self):
Verifies the yielded results are what is expected
result self.hdo._querystring_to_dict()
expected {deep: deep,
dish: dish,
pizza:pizza}
self.assertEqual(result,expected)
def test_multi_string_replace(self):
query pizza Ive had better
expected pizza Ive had better
query_dict self.hdo._querystring_to_dict()
result self.hdo._multiple_string_replace(query, query_dict)
self.assertEqual(expected, result)
def test_doc_to_sentences(self):
Consumes the generator, and then verifies the result[0]
results []
expected (0,\nReview for their take-out only.)
for sentence in self.hdo._doc_to_sentences():
results.append(sentence)
self.assertEqual(results[0], expected)
def test_highlight(self):
Verifies highlighted text is what we expect
expected Tried their large Classic (sausage, mushroom, peppers and onions)\
deepdish;and their large Pesto Chicken thin crust \
pizzas.
actual self.hdo.highlight_doc()
self.assertEqual(expected, actual)
def tearDown(self):
del self.query
del self.hdo
del self.document
if __name__ __main__:
unittest.main()
test_functional_highlight.pyFunctional Test That Performs Some Basic Sanity Checks
from highlight import HighlightDocumentOperations
def test_snippit_algorithm():
document1
This place has awesome deep dish pizza.
I have been getting delivery through Waiters on wheels for years.
It is classic, deep dish Chicago style pizza.
Now I found out they also have half-baked to pick-up and cook at home.
This is a great benefit. I am having it tonight. Yum.document2 Review for their take-out only.
Tried their large Classic (sausage, mushroom, peppers and onions) deep dish;\
and their large Pesto Chicken thin crust pizzas.
Pizza Ive had better. The crust/dough was just way too effin dry for me.\
Yes, I know what cornmeal is, thanks. But its way too dry.\
Im not talking about the bottom of the pizza...Im talking about the dough \
thats in between the sauce and bottom of the pie...it was like cardboard, sorry!
Wings spicy and good. Bleu cheese dressing only...hmmm, but no alternative\
of ranch dressing, at all. Service friendly enough at the counters.
Decor freakin dark. Im not sure how people can see their food.
Parking a real pain. Good luck.
h1 HighlightDocumentOperations(document1, deepdishpizza)
actual h1.highlight_doc()
print Raw Document1: %s % document1
print Formatted Document1: %s % actual
assert len(actual) 500
assert in actual
h2 HighlightDocumentOperations(document2, deepdishpizza)
actual h2.highlight_doc()
print Raw Document2: %s % document2
print Formatted Document2: %s % actual
assert len(actual) 500
assert in actual
if __name__ __main__:
test_snippit_algorithm()
如果想运行以上代码示例需要下载 Natural Language Toolkit 源代码并按照说明下载 nltk 数据。因为本文并不讨论代码示例本身而是讨论创建和测试它的方式所以不详细解释代码的实际作用。最后我们对源代码运行静态代码分析工具 pylint
Pylint% pylint highlight spy
No config file found, using default configuration
************* Module highlight
E: 89:HighlightDocumentOperations._doc_to_sentences: Instance of unicode has no
tokenize member (but some types could not be inferred)
E: 89:HighlightDocumentOperations._doc_to_sentences: Instance of ContextFreeGrammar
has no tokenize member (but some types could not be inferred)
W:108:HighlightDocumentOperations._score_sentences: Used builtin function map
W:192:HighlightDocumentOperations._multiple_string_replace: Used builtin function map
R: 34:HighlightDocumentOperations: Too few public methods (1/2)
Report69 statements analysed.
Global evaluation
-----------------
Your code has been rated at 8.12/10 (previous run: 8.12/10)
代码的得分为 10 分制的 8.12 分工具还指出了几处缺陷。pylint 是可配置的很可能需要根据项目的需求配置它。可以参考 pylint 官方文档见 参考资料。对于这个示例第 89 行上的两个错误源于外部库 nltk两个警告可以通过修改 pylint 的配置消除。一般来说不希望允许源代码中存在 pylint 指出的错误但是在某些时候比如对于上面的示例可能需要做出务实的决定。它并不是完美的工具但是我发现它在实际工作中非常有用。
结束语
在本文中我们探讨了看待测试的方式如何影响软件的结构以及缺乏面向测试的思想为什么会给项目带来致命的危害。我们提供了一个完整的代码示例包括功能性测试和单元测试用 nose 对它执行了代码覆盖率分析还运行了两个静态分析工具 pylint 和 pygenie。我们没有来得及讨论的一个问题是如何通过某种形式的连续集成测试使这个过程自动化。幸运的是很容易用开放源码的 Java™ 连续集成系统 Hudson 实现这个目标。我希望您参考 Hudson 的文档见 参考资料尝试为项目建立自动化测试它应该运行您的所有测试包括静态代码分析。
最后测试不是万灵药静态分析工具也不是。软件开发是艰难的工作。为了争取成功我们必须时刻牢记真正的目标。不但要解决问题而且要创建能够证明有效的东西。如果您同意这个观点就应该明白过分复杂的代码、傲慢的设计态度以及对 Python 的强大能力缺乏尊重都会直接妨碍实现这个目标。
感谢 Imagemovers Digital 的 Kennedy Behrman 审阅了本文。
下载资源Zip 文件 (clean_code_sample.zip | 5.4KB)
相关主题