bpo + automation   22

On system rollback and totalised fields: An algebraic approach to system change
"In system operations the term rollback is often used to imply that arbitrary changes can be reversed i.e. ‘rolled back’ from an erroneous state to a previously known acceptable state. We show that this assumption is flawed and discuss error-correction schemes based on absolute rather than relative change."

"By formulating this problem algebraically, the discussion is distanced from the sometimes emotional standpoints that bind system administrators to the notion of rollback: desperately wanting does not make it possible. The discussion about totalisation of fields is particularly useful, as it maps nicely to the flaws in this thinking. To deal with the inverse of a many-to-one map, one must invoke a policy or arbitrary selection."
paper  deployment  cfengine  automation 
february 2012 by bpo
David Fortunato's answer to What are the technical practices of a Lean Startup? - Quora
tl;dr Start from a foundation of good engineering practices and in the
words of Bobby Shaftoe from Cryptonomicon, "Show some f***ing
adaptability."

At Wealthfront, the engineering team is built around the concentric
circles of our immune system:
1) great automated testing,
2) thorough release automation,
3) comprehensive monitoring, and
4) a culture of continuous improvement in our process.

Our release cycle of about 10 minutes from commit to production (unlike the standard one week) lets us to "learn more, quickly" while maintaining safety and code quality.

On a day-to-day basis, we're trunk stable, push small, frequent commits and maintain forwards and backwards data compatibility between our services. These practices allow our engineering team to respond to the needs of the business in a timely way.

We pair program on difficult problems, do periodic code reviews, and our design reviews before a big project don't aim to predict exactly where we'll end up (that's impossible). Instead we try to agree on our design concerns, methods of addressing them, and the key metrics to objectively measure the project's success.

We practice test driven development. We split testing into two buckets: (a) checking that an implementation meets the spec and (b) checking that the code works as expected. We rely on the requester of a feature to have the context and the motivation to see it perfect. To make sure that the code works as expected we have significant automated testing. We start with unit tests before we write the code, running the standard TDD loop: (1) write the test, (2) verify the test fails, (3) write the code to fix the test, (4) verify the test passes. We add integration testing to verify the interaction between systems and services.

To ensure our frontend works as promised, we run automated tests before every release using Selenium. We test defensively; preventing unwelcome situations from being able to occur by checking them in tests. We have meta-tests that prevent past problems with APIs and help maintain code quality. Delivering a feature requires delivering a working feature: therefore, we strongly believe that testing is everyone's responsibility: all of engineering owns the quality of the software we produce.

We automate our releases to prevent problems. When you release code 50 times a day, low probability problems with your deployment infrastructure can cause significant breaks in productivity and availability. Our deployment software monitors for errors and when it detects them is able to rollback code without human intervention.

We monitor business metrics constantly with automation for a few reasons. First, business metrics are great at detecting a broad class of infrastructure problems. Second, we learn about how our code and features are performing from a business perspective. Third, we get constant feedback about the success of our business.

Lean Startups emphasize learning quickly by iterating: designing and building the product is a core feedback loop to make the startup successful. We consider our engineering practices to be an important feedback loop too. As problems arise we stop development, get to the root of the issue and decide on short term and long term solutions. We invest proportionally in implementing the solutions (things that have significant business risk take priority, things that don't are addressed in proportion to the amount of our time that they consume).

Almost all of the practices I describe above have evolved as we've learned from problems with our previous processes. As we continue to learn, we'll implement further changes.

Ultimately, the practices of an engineering team operating in a lean startup is flexibility on demand: the things that slow you down, the things that get in your way need to be fixed, and those fixes become your engineering practices.
startup  business  automation 
february 2012 by bpo
Bug Prediction at Google | Google Engineering Tools
"We implemented the Rahman algorithm by creating a program that hooked into our source control system, and pulls out all the changes which had a bug attached to them. It looks at each bug number, and verifies with the bug-tracking database that it was really a bug, and filters out everything else, such as feature requests. It then looks at all the files that appeared in these changes, and filters out those that have been deleted and are no longer at HEAD. For each file, the number of bug-fixing changes it's been in is calculated, and we output the files which were ranked in the top 10%."

This is coupled with recency weighting to find bugs. Not a new technique but nifty to see a large-scale implementation that works.
google  qa  testing  automation  bugs 
december 2011 by bpo
lib at master from igrigorik/bugspots - GitHub
Ilya Grigorik's implementation of the Google bug prediction tool
tools  git  qa  testing  automation  bugs 
december 2011 by bpo
Research Systems Unix Group: radmind
At its core, radmind operates as a tripwire. It is able to detect changes to any managed filesystem object, e.g. files, directories, links, etc. However, radmind goes further than just integrity checking: once a change is detected, radmind can optionally reverse the change.

Each managed machine may have its own loadset composed of multiple, layered overloads. This allows, for example, the operating system to be described separately from applications.
tools  sysadmin  deployment  unix  automation  cfengine 
november 2009 by bpo
Augeas - a configuration API
Augeas is a configuration editing tool. It parses configuration files in their native formats and transforms them into a tree. Configuration changes are made by manipulating this tree and saving it back into native config files.
programming  tools  automation  sysadmin  tool  linux  unix 
july 2009 by bpo
Chef - Opscode Open Source Wiki
Chef is a systems integration framework, built to bring the benefits of configuration management to your entire infrastructure
ruby  tools  deployment  sysadmin  tool  automation  infrastructure  framework 
april 2009 by bpo
Vlad the Deployer
Ruby Hit Squad's replacement for Capistrano. Also possibly the best logo ever.
automation  deployment  capistrano  rails  tools 
december 2007 by bpo
Cfengine - an adaptive system configuration management engine
"Cfengine's principal promise is to be based on the very best and latest research. It does not aim to be user-friendly, but user-invisible."
automation  deployment  distributed  documentation  reference  sysadmin  systems  tools  unix  infrastructure 
october 2007 by bpo
Sake Bomb!'
Sake - system-wide Rake. Install and share common rake tasks across your system.
automation  rake  ruby  tools 
july 2007 by bpo
ControlTier
"ControlTier provides a framework and toolset that allows development, QA and operations staff to collaboratively build and execute portable, model-driven automation that can be used to release, deploy and configure complex integrated applications in mult
automation  deployment  opensource  infrastructure  sysadmin 
may 2007 by bpo
Amazon Mechanical Turk
A webservice for simple large-scale human labor
analysis  outsourcing  automation  business  webservices  amazon 
march 2007 by bpo
S3 + Rake = Easy Backups for SVN Repositories, Databases, and Code
Rake script with several tasks for backing up data to S3.
s3  backup  amazon  automation  ruby  rake  code 
january 2007 by bpo

Copy this bookmark:



description:


tags: