Instead of posting individual posts per videos, here are some rough notes from a bunch more of the dotScale 2013 videos. This is not a complete list. Any mistakes are my own. Watch the videos if you want a direct source.
dotScale 2013 – Jonathan Weiss – DevOps at scale
Works for OpsWorks
First Rule: Things will break, plan for it.
Divide and Conquer, decouple
“Limit the blast radius” if a component goes down or becomes slow, other components should continue working
Deploy new version to 1 of many many machines, measure (latency, CPU, Memory, key performance, errors, etc…) Then, if it’s good, roll out to more. Requires supporting multiple versions concurrently. Staged rollout. Automate this.
Have good backup/restore and disaster recovery strategy. Practice it frequently.
Chaos Monkey – Introduce failure daily so that you can be sure to handle it automatically.
Measure everything that you can.
Lots of testing and auto-reconfigure based on goals.
dotScale 2013 – Nicolas Fonrose – Welcome to your new job
Architects know to worry about: reliability, latency, speed
Now we need to worry about cost.
Everything we do in the cloud has a cost.
Cost Driven Design
Lots of opportunity to save money by using cloud computing/storage
- Script everything, never work manually, don’t use graphical interfaces, use the API
- Measure everything: not just performance, but costs
- Continuous Management of cost: like build automation, but cost improvement
- Correlate between all actions and cost to get a good big picture view
dotScale 2013 – Thomas Stocking – Virtual Network over TRILL
gandi.net public cloud provider
Need layer-2 network isolation for large scale multi-tenancy
VNT = TRILL + VNI (TRILL = Transparent Interconnection with Lots of Links; VNI = Virtual Network Identifier)
Planned to open source it in the next year
dotScale 2013 – Stanislas Polu – Uses of tmux explained
Terminal multiplexer. Let’s you switch between multiple applications in one terminal.
dotScale 2013 – Quentin Adam – Scaling, what you need to know
Founder of Clever Cloud
Scale up or Scale out (hint: out is better)
Scale out = many workers doing the same thing, avoid single point of failure, easier to grow
Differentiate between process and storage
Storage: Database, files, Sessions, *Events*, user accounts, user data
Process: Can be replicated, stateless, process (takes data, transforms data, produces data)
Statelessness is key
Choose data store wisely (probably choose multiple data stores for different parts of the system)
Example questions when choosing a data store:
- Do I need atomicity of requests?
- Do I need concurrent access (read? write?)?
- Do I mostly read or write?
- Do I need relational?
- Do I need big storage capacity?
- Do I need high availability?
- How long do I need the data?
Use an online (Internet based) database to test ideas before messing up your computer with installing software.
Don’t start with technologies (Node.js + Mongo) and then ask yourself what problem/project you’ll solve/build. Start with a problem, then find the right technologies.
Balance learning curve with time saved.
Don’t make monsters (technology twisted to do something it wasn’t designed for). For example job queue built on MySQL and Cron.
Common mistakes to avoid:
- Don’t use RAM as data store (avoid shared/global variables; can’t scale, will cause error).
- RAM should be used to take a bit of data, process it, then dump it.
- Processes should return the same output for a given input.
- If you store in memory, code will fail, and data will be lost.
- Don’t use the file system.
- Be very careful with dark magic
- Split your code into modules
- Keep the code per module small (makes it easier to find the bugs)
- Modules should act as services to eachother
- Choose the right technology per module
- Use event broker to modularize the app
- Make hard computation async
- Always use a reverse proxy
- Use process deployment
- When things fail (and they will)
- Keep calm, get metrics, find the bug, fix the bug