我眼中的DevOps

开发 项目管理
过去一年以来,一批来自欧美的、不墨守陈规的系统管理员和开发人员一直在谈论一个新概念:DevOps。DevOps 就是开发(Development) 和运维(Operations)这两个领域的合并。
Over the last year or so a bunch of presumptuous European sysadmins and developers, joined by some of their American brethren and even a couple of us antipodeans (there are others too!) have been talking about a concept called DevOps.  DevOps is the merger of the realms of development and operations (and if truth be told elements of product management,QA, and *winces* even sales should be thrown into the mix too).

过去一年以来,一批来自欧美的、不墨守陈规的系统管理员和开发人员一直在谈论一个新概念:DevOps。DevOps 就是开发(Development) 和运维(Operations)这两个领域的合并。(如果没错的话,DevOps还包括产品管理、QA、*winces* 甚至销售等领域)

The Broken

脱节(The Broken)

So … why should we merge or bring together the two realms?  Well there are lots of reasons but first and foremost because what we’re doing now is broken.  Really, really broken.  In many shops the relationship between development (or engineering) and operations is dysfunctional to the point of occasional toxicity.

那么……为什么要合并这两个领域?原因很多,但首要原因是:我们目前的工作流程是脱节的。绝对的脱节。很多公司的开发部门和运维部门之间存在的深刻矛盾,其实就是这个“脱节”造成的。(意译,求斧正)

Here’s an example I think everyone will be at least partially familiar with: the minefield that is project to production software deployment.  Curse along as I explain.

下面是一个大家都基本熟悉的例子:部署软件产品。

Development builds an application, the new hotness which promises customers all the whizz-bang features and will make the company millions.  It is built using cutting edge technology and a brand new platform and it has got to be delivered right now.  Development cuts code like crazy and gets the product ready for market ahead of schedule.  They throw their masterpiece over the fence to Operations to implement and dash off to the pub for the wrap party.

开发部门要开发一款新产品。这款产品要使用最新最炫的技术,来保证客户的所有花俏的需求,从而给公司带来百万美元的利润。这款产品被要求使用最新的技术和运行平台,还得马上交付。于是开发部门没日没夜的加班、赶代码(cuts code like crazy),终于如期完成了任务。然后他们把自己的“杰作”一股脑的甩给了运维部门,后者还没能完全接手,前者已经迫不及待的开始了庆功会。

Operations catches the deployment and is filled with horror.

接到产品后,运维部门每个人的心中都充满了恐惧。

The Operations team summarises their horror and says one or more of:

下面就是运维部门的恐惧之源:( {A.B.C} 表示 A 或 B 或 C 之一 )

    * The wonder application won’t run on our infrastructure because {it’s too old, it doesn’t have capacity, we don’t support that version}

* 这款优秀的产品在目前的底层平台上无法运行,因为这个平台{太古老了,空间不足,不支持某某版本}

    * The architecture of the application doesn’t match our { storage, network, deployment, security } model

* 这款产品的体系结构跟我们的{存储,网络,部署,安全}模型不匹配。

    * We weren’t consulted about the { reporting, security, monitoring, backup, provisioning } and it can’t be “productionised”.

* 这款产品的{ 报告,安全,监视,备份,服务提供} 我们搞不懂 ,所以没法把它做成实际可用的产品。
 

But Operations persevere and install the new hotness – cursing and bitching throughout.  Sadly, after forcing the application onto infrastructure and bending and twisting the architecture to get it running, the performance of the new application can be summed up as “epic fail”.

尽管伴随着不绝于耳的抱怨和咒骂,运维部门最终还是把这款产品安装好了。不幸的是,由于做了很多蹩脚的修改和不合理的强迫式运行,这款产品的性能最后被归结为:终极失败(Epic Fail)。

Operations sighs and starts logging problems and passing issues back to the Development team.  Their responses generally come from the following pool:

于是非常沮丧的运维部门开始记录各种问题,源源不断的给开发部门提Issue。而开发部门的回应基本上都是:

    * It’s not our fault – our code is perfect – it’s just been poorly implemented

* 这不是我们的错 —— 我们的代码非常完美——而是(运维部门的)部署做的太差劲了。

    * Operations are stupid and don’t understand the new hotness – why can’t they implement the cutting edge technology? Why are they so backward?

* 运维部门比较笨,他们不懂新技术—— 为什么他们没法实现最新的技术呢?为什么他们这么落伍呢?

    * It runs fine on my machine…

* 在我的机器上运行的没问题啊……

The interactions between teams quickly becomes a toxic blame storm. The customers (and by extension the shareholders, investors and management) then become the losers.  The loop gets closed with the company losing bucket loads of money and everyone losing their jobs.  EPIC and FAIL.

两个部门之间的交流很快变成了一场暴风骤雨。客户(以及股东、投资方和管理层)则成了蒙受损失的失败方。最终公司损失了无数的金钱,大家也都失业了。终极的失败。

What’s different about DevOps?

DevOps 又有啥不同?它有什么好处?

DevOps is all about trying to avoid that epic failure and working smarter and more efficiently at the same time. It is a framework of ideas and principles designed to foster cooperation, learning and coordination between development and operational groups. In a DevOps environment, developers and sysadmins build relationships, processes, and tools that allow them to better interact and ultimately better service the customer.

DevOps 就是想方设法的避免这种“终极失败”,同时让大家用更聪明更有效的方式去工作。它是一种框架,包含了很多优秀想法和原则,它鼓励开发部门和运维部门通力合作。在DevOps环境中,开发人员和系统管理员会构建一些关系、流程和工具,从而更好的与客户互动,最终提供更好的服务。

DevOps is also more than just software deployment – it’s a whole new way of thinking about cooperation and coordination between the people who make the software and the people who run it.  Areas like automation, monitoring, capacity planning & performance, backup & recovery, security, networking and provisioning can all benefit from using a DevOps model to enhance the nature and quality of interactions between development and operations teams.

DevOps 也不仅仅是一种软件的部署方法。它通过一种全新的方式,来思考如何让软件的作者(开发部门)和运营者(运营部门)进行合作与协同。使用了DevOps模型之后,会使两个部门更好的交互,使两者的关系得到改善,从而让很多领域从中受益,例如:自动化、监视、能力规划和性能、备份与恢复、安全、网络以及服务提供(provisioning)等等。

Everyone in the DevOps community has a slightly different take on “What is DevOps?”  We all bring different experiences and focuses to the problem space.  I personally see DevOps as having four quadrants:

“对于DevOps是什么?” 这个问题,DevOps社区中的每个人的回答都不尽相同。因为我们的工作经验不同,关注的问题也不同。就我个人而言,DevOps分成四大部分:

Simplicity

简单

KISS is King and in that vein this section is simple too. Design simple, repeatable, and reusable solutions. Simplicity saves documentation, training, and support time.  Simplicity increases the speed of communication, avoids confusion, and helps reduces the risk of development and operational errors.  Simplicity gets you to the pub faster.

KISS(Keep it Simple and Stupid,简单就是美)原则是最重要的。所以本段文字也很简单。我们要尽量提供简单、可重用的解决方案。“简单”节约了书写文档、培训和提供支持的时间。“简单”增加了沟通的速度、避免混淆、减少了开发和运维出错时的风险。“简单”让人更快的发布产品。

Relationships

部门之间关系

Engage early, engage often. Development teams need to embed operations people into their project and development life cycles.  Invite operational people to your scrum or development meetings.  Share ideas and information about product plans and new technologies. Gather operational requirements when gathering functional ones. As a project progresses test deployment, backup, monitoring, security and configuration management as well as application functionality.  The more issues you fix during the project the less issues you expose your customers to when the application is live.  Educate operations people about the applications architecture and the code base. The more information operations people can feed you about a problem with the code the less trouble-shooting you need to perform and the faster the problem can be fixed.

早参与,多参与。对于开发人员,要让运维人员常驻到开发部门,全程参与开发流程。邀请运维人员参与你的Scrum或者开发会议,与他们分享项目计划、分享新技术的点子和心得。搜集功能性需求(指开发人员用到的需求)的同时也要搜集运维方面的需求。把对于“发布、备份、监控、安全、配置管理和系统功能”的测试作为一项独立的项目流程。软件产品在开发时解决的问题越多,那么在使用时暴露给用户的问题就越少。给运维人员做培训,让他们弄清楚项目的体系结构和核心代码。如果运维人员在反馈bug时提供的信息越多,那么你花在排查问题(trouble-shooting) 的时间就越少,这个bug也就会更快的被解决掉。

Operations people need to bring development people into the problem and change management space. Invite developers into your team meetings. Share your roadmaps and upgrade plans.  Understand where future development is heading to better ensure infrastructure deployments match product requirements.  Developers also bring skills, knowledge and tools that can help make your environment easier to manage, more efficient and cleaner. Learn to code or if you’re a hack-n-slash systems programmer like me then learn to code better. 

Concepts like building tools with APIs rather than closed interfaces, distributed version control, test driven development, and methodologies like Agile Development, Kanban and Scrum can revolutionise operational practises in the same way they’ve changed the way code is cut.

对于运维人员,在遇到问题时需要把开发人员加进来,大家一起解决问题。邀请开发人员参与你们的会议,分享项目进度(roadmaps),并且共同修订工作计划。运维人员一定要了解开发部门下一步的工作方向,从而确保产品运行的底层平台能够良好的支持最新技术。开发人员也会带来相关的技术、知识和工作,帮助你们改善产品的运行环境,使其更加易于维护、简洁有效。

有一些开发领域的概念,例如:“要根据API而非封闭的interface来构建工具”,分布式版本控制,驱动测试开发,以及诸如敏捷开发、看板管理(Kanban) 和Scrum等方法论。如果把这些概念应用在运维领域,同样会产生革命性的变革。

Don’t be afraid of ideas and approaches from outside your domain – we can all learn things, even if it’s “let’s never do it that way again…!”, from how others do things and ultimately? Guess what? Yep, we’re all on the SAME team.

不要惧怕新点子和新技术。我们可以随时随地的向他人学习,哪怕是一句“我们再也不要那样做了!” 也会让我们从中获益。尽管处于不同的部门,但是我们要共同学习、共同成长,这样才能协同工作的更好!

Remember that interactions between people rank, in decreasing order of effectiveness (in IMHO but backed by some research):

按照从高到低的顺序,有效的沟通方式应该是:

   1. Face to face  

1. 面对面交流 
 

   2. Video conference

2. 视频会议 
 

   3. Phone

3. 电话 
 

   4. IM & IRC

4. 即时通讯软件 
 

   5. Email

5. Email.
  

Process

工作中的流程

Don’t underestimate the power of process and automation.  Many shops do process engineering – ranging from hand-written lists to ISO9001. Those processes generally have one key flaw: they focus on the outcome and its inevitability.  A simple process might provision a host – Step 1 install machine, Step 2 cable machine, Step 3 install OS, etc, etc. Assuming all goes to process then at the end of Step x you will have a fully provisioned host. But what happens if it doesn’t go right?  If your process breaks or you receive some anomalous output how does your process deal with it?  

Instead think about process as a journey and map out the potential pitfalls and obstacles.  Treat your processes like applications and build error handling into them.  You can’t predict every application or operational pitfall or issue but you can ensure that if you hit one your process isn’t derailed.

不要低估流程和自动化的作用。很多公司都有自己的流程管理(process engineering)—— 从原始的笔录到 ISO9001。但它们都存在一个关键的缺陷:过于理想化,它要求每个步骤都必须成功执行。例如:为了搭建一台新主机,会有下列一套简单的流程:步骤一:装机(把各个硬件组装到一起)。步骤二:接线、通电。步骤三:安装操作系统。接下来还有步骤四、五、六。如果一切顺利的话,第N步结束之后就会有一个功能完整、运行正常的新主机。但万一有个流程没跑通怎么办?比如说在某个步骤断了,走不下去了,或者在这一步得到了异常的输出,有没有另外的步骤来处理这个异常?

所以,流程绝对不会从头到尾一帆风顺,所以我们要把每一步流程都认真对待,找出所有潜在的问题和障碍。跟软件产品一样,在流程的管理中也要有异常处理。我们不必做到精确预见每一个问题,但一定要保证:即使流程出错,它还能往下走。

Link process together across domains – software deployment, monitoring, capacity planning and other “operational” processes have their start in the development world.  Software deployment is the logical conclusion of the software development life cycle and should be viewed as such rather than a separate operational process. Another example is metrics and monitoring, it is hard to measure anything  without understanding the baselines and assumptions made in the development domain.  Joint processes also mean more opportunity for development and operations interaction, understanding and joint accountability. Finally, joint process development means single repositories for documentation and other opportunities for economies of scale.

把不同领域的所有流程串到一起。这些领域包括:部署、监控、能力计划(capacity planning) 等等。从逻辑上讲,“部署”是软件开发周期的最后一环,所以它应该属于“开发流程”,而非“运维流程”。另一个例子是度量和监控。在开发领域,如果不理解底线标准和估算,就什么评估都做不了。把开发部门和运维部门的流程衔接在一起,也会让两个部门更好的配合、相互理解、承担共同的责任。最后还有个优点:文档只需要一份而不是两份(开发一份、运维一份),从而节省了资金。  

Automate, automate, automate. Build or make use of simple and extensible tools (make sure they have APIs and machine readable input and output – see James White’s Infrastructure Manifesto).  Use tools like Puppet (or others) to manage your configuration.  Remember to extend your automation umbrella cross-domain and end-to-end in your environment – manage development, testing, staging and production environments with the same tools and processes.  Not only does this have economies of scale benefits in support and management but it means you can test deployment and management alongside functionality as your application and new codes rolls toward production.

自动化,自动化,还是自动化。构建或使用简单、可扩展的工具(确保提供API, 机器可读的输入、输出 -- 参考 James White的文章:Infrastructure Manifesto)。使用Puppet一类的工具做配置管理。要扩展这些自动化工具,使其能够支持多个领域(开发领域和运维领域),并且在产品的不同环境(开发环境、测试环境、发布环境和生产环境)中使用相同的工具(也叫end-to-end)。这样不但会在产品支持和管理方面带来经济效益,而且也可以在编写新代码的同时,进行产品的发布和管理。

Finally, when building process and automation always keep the KISS principle in mind. Complexity breeds opportunities for error. Build simple processes and tools that are easy to implement, manage and maintain.

最后,在构建流程和自动化时,要把KISS原则牢记于心。越复杂就越易错。只有简单的流程和工具才易于实现、易于管理和易于维护。

Continuous Improvement

持续改进

Don’t stop innovating and learning.  Technology moves fast.  So do customer requirements. Build continuous improvement and integration into your tools and processes.  Here is a good place operations people can learn from (good) developers about practises like test-driven development.  A good example here is to build tests for your software deployment process and infrastructure.  They are often an application in their own right and should be developed and maintained correctly. Your monitoring could also be extended with behavioural testing to deliver better business value.  Look at using development domain tools, like Hudson for example, to explore and measure the operational domain.

不要停止创新和学习。当今技术发展的很快,客户的需求也往往如此。把“持续改进和持续集成” 加入到你的工具和流程中去,这也是运维人员向(优秀的)开发人员学习的好途径,可以学到诸如测试驱动开发等最佳实践。例如:可以向你的部署流程中加入单元测试。做监控时也应该增加些行为测试,提高交付质量。尝试用开发领域中的工具(例如Hudson)在运维领域中做些工作(例如浏览数据(explore)、测量性能(measure)等等)。

Learn from mistakes and from outages.  Seek root cause aggressively AND cross-domain.  If you have an outage and a post-incident review then bring development and operational teams together to review the incident.  Sometimes some simple code refactoring can save making infrastructure changes.  Work together to fix root cause, treat it with the same process you develop to conduct project to production software deployment, rather than relegating them to incident review reports or batting issues between teams.

要不断的总结教训。要积极主动的、在不同领域寻找错误的根源。 一旦收到错误报告,就果断把开发小组和运维小组找来,一起解决这个问题。有时候开发人员很简单的几次代码重构,就可以很好的避免底层运行环境的改变,减少运维人员的负担。总之,遇到问题时,开发部门和运维部门要密切配合、共同解决,而不是互相推诿、踢皮球。

Me

对我来说...

Finally, for me DevOps is about people and nature of the environment you want to work in.  The best thing about the movement for me is that it is trying to foster behaviours and environments where people work together towards joint goals rather than at cross-purposes or at odds.  That’s a world I’d much rather use my skills in.

最后,对我来说,DevOps 的主要内容是:跟谁共同工作、如何共同工作。它最吸引我的地方就是致力于把不同部门不同分工的人召集到一起,共同努力解决问题。这样的工作环境,是我所憧憬的乐园。

责任编辑:林师授 来源: 译言网
相关推荐

2015-08-27 14:52:19

DevOps职责

2013-03-21 13:42:55

JSjQYUI

2016-12-19 14:35:50

软件系统

2015-07-20 11:32:07

编程语言

2012-12-25 09:43:08

2013-01-17 14:38:37

Fedora 18

2017-03-22 11:22:04

JavaScript函数式编程

2012-12-26 09:20:30

2009-02-25 19:52:37

IT认证华为认证IT产业

2022-01-24 07:20:05

DevOps软件开发

2012-03-09 09:45:50

2013-04-27 12:01:09

大数据全球技术峰会大数据

2009-04-20 09:01:32

2016-12-20 14:46:34

Android开源Weex

2012-03-21 21:04:50

乔布斯

2019-03-05 12:12:39

数据库HTAPACID

2012-01-16 09:10:44

微软诺基亚Lumia

2012-05-27 20:12:30

Windows Pho

2013-02-27 10:08:40

2016-12-01 14:16:18

GitSCM配置
点赞
收藏

51CTO技术栈公众号