Finch: devops

Channel: devops

«
6
»

Troubleshooting Docker and Kubernetes with sysdig

devops sysadmin

anders @ 2016-12-22 22:44:40 +0000 UTC

https://www.nivenly.com/k8s-aws-private-networking/

devops sysadmin

anders @ 2016-12-08 21:13:45 +0000 UTC

Go + Docker

anders @ 2016-11-20 12:03:30 +0000 UTC

thockin/go-build-template: A Makefile/Dockerfile example for Go projects.

anders @ 2016-11-20 11:59:06 +0000 UTC

Infrastructure discovery with etcd

anders @ 2016-11-13 22:03:25 +0000 UTC

Terraform, Salt, Consul and DigitalOcean

devops sysadmin

anders @ 2016-09-18 21:11:52 +0000 UTC

Monitoring and Tuning the Linux Networking Stack: Receiving Data

devops sysadmin

anders @ 2016-09-14 11:50:07 +0000 UTC

Monitoring Performance in Microservice Architectures - Container Solutions

devops distributed systems

anders @ 2016-09-14 07:00:14 +0000 UTC

Managing Apt Repos in S3 Using Lambda

anders @ 2016-08-08 09:32:57 +0000 UTC

Billions of Messages a Day - Yelp’s Real-time Data Pipeline

devops distributed systems

anders @ 2016-07-16 22:14:29 +0000 UTC

Dockerizing Elixir and Phoenix Applications - Semaphore

anders @ 2016-07-14 13:57:09 +0000 UTC

API versioning methods, a brief reference

anders @ 2016-07-01 09:59:21 +0000 UTC

Service autodiscovery in Go with sleuth - darian.af

devops distributed systems golang

anders @ 2016-06-17 11:13:58 +0000 UTC

ahmetalpbalkan/go-dexec: It’s like Go os/exec package but for Docker

anders @ 2016-06-16 14:55:11 +0000 UTC

Automated certificate provisioning in Kubernetes using kube-lego // Jetstack Blog

devops golang sysadmin

anders @ 2016-06-15 21:12:46 +0000 UTC

tevino/tcp-shaker: Performing TCP handshake without ACK in golang, useful for health checking, that is SYN, SYN-ACK, RST.

anders @ 2016-06-01 12:02:52 +0000 UTC

Lessons from Building a Node App in Docker

anders @ 2016-06-01 06:39:39 +0000 UTC

SystemdForUpstartUsers - Ubuntu Wiki

devops linux sysadmin

anders @ 2016-06-01 06:38:33 +0000 UTC

Measuring Events with Google Analytics

anders @ 2016-06-01 06:36:34 +0000 UTC

Effective Health Checks in Go

anders @ 2016-06-01 06:36:09 +0000 UTC

Resource management in Docker

anders @ 2016-06-01 06:35:49 +0000 UTC

Real World Elixir Deployment

devops elixir erlang

anders @ 2016-06-01 06:33:24 +0000 UTC

WTF is serverless

devops sysadmin

anders @ 2016-06-01 06:32:30 +0000 UTC

rcrowley/go-metrics: Go port of Coda Hale’s Metrics library

anders @ 2016-05-25 15:00:25 +0000 UTC

Testing web services with traffic control on Kubernetes

devops distributed systems linux sysadmin

anders @ 2016-05-25 14:07:15 +0000 UTC

Nelson rules - Wikipedia, the free encyclopedia

anders @ 2015-07-31 14:14:49 +0000 UTC

Building Docker Images for Static Go Binaries

anders @ 2015-07-20 20:19:46 +0000 UTC

Small Docker Images For Go Apps

anders @ 2015-07-20 20:18:29 +0000 UTC

A Go, Docker workflow

anders @ 2015-07-02 09:58:43 +0000 UTC

A Go, Docker workflow

anders @ 2015-07-02 09:58:43 +0000 UTC

Time-Series Database Requirements

devops distributed systems

anders @ 2015-07-01 09:02:06 +0000 UTC

The mathematics of RAID-6

devops distributed systems papers

anders @ 2015-06-27 20:53:39 +0000 UTC

A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems

devops distributed systems papers

anders @ 2015-06-27 20:53:08 +0000 UTC

Orchestrate Containers for Dev with Docker Compose

anders @ 2015-06-05 11:54:21 +0000 UTC

Using Consul and Cloudslang to automatically clean up old Docker containers

devops sysadmin

anders @ 2015-06-04 12:40:54 +0000 UTC

Continuous Integration and Delivery with Docker | Codeship | via @codeship

anders @ 2015-06-01 08:57:56 +0000 UTC

Consul for Cluster Health Monitoring - VividCortex

devops sysadmin

anders @ 2015-06-01 08:50:59 +0000 UTC

Virtualization and Protection Rings (Welcome to Ring -1) Part I | Zeros & Ones

devops linux sysadmin

anders @ 2015-05-29 15:02:39 +0000 UTC

ops school curriculum

devops sysadmin

anders @ 2015-05-10 10:30:43 +0000 UTC

rcrowley/go-metrics

anders @ 2015-05-10 10:29:26 +0000 UTC

bazel: build tool from google

anders @ 2015-04-07 11:57:43 +0000 UTC

Backblaze finds only a few particular SMART metrics useful for predicting and detecting hard drive failure

Backblaze uses SMART 5, 187, 188, 197 and 198 for determining the failure or potential failure of a hard drive.

devops linux sysadmin

anders @ 2014-12-13 18:32:26 +0000 UTC

Guillaume’s Thoughts: Release Go code (and others) via Docker using Makefile

anders @ 2014-11-11 12:41:21 +0000 UTC

Eight Docker Development Patterns

anders @ 2014-10-27 20:22:24 +0000 UTC

Introducing Consul Template - HashiCorp

anders @ 2014-10-22 13:36:15 +0000 UTC

lokalebasen/go-env

pull down environment variables from etcd and run a process with them

anders @ 2014-10-17 19:55:41 +0000 UTC

Crash-only software: More than meets the eye [LWN.net]

“only way to stop is to crash. only way to start is to recover”

devops distributed systems

anders @ 2014-10-16 20:47:25 +0000 UTC

How Google’s Build System Works

anders @ 2014-10-14 14:04:58 +0000 UTC

Keys to SRE

Talk by Google’s SRE Czar.

50% dev/maintenance ratio
at least 5% of support tickets need to go directly to developers
SRE’s are free to leave any project at any time
in an outage: minimize impact + prevent recurrence

anders @ 2014-10-14 12:57:10 +0000 UTC

My Philosophy on Alerting - Google Docs

Writeup from a Google SRE on alerting/monitoring. Very well thought out.

Pages should be urgent, important, actionable, and real.

emphasis on reducing noise levels
emphasis on end-to-end, black box, symptom-based alerting rather than the cause (I assume there is still enough monitoring/metrics in place to quickly diagnose the cause from the symptom)
a daily report can be a good channel for non-critical, but time-sensitive alerts, particularly on causes, (disk getting relatively full, unusually large numbers of slow queries, etc)
“Every alert should be tracked through a workflow system.” not just dumped into an IRC channel or email list.

This is good for thinking about Hound. Overall, a lot of effort has gone into making all of Hound’s alerts be “urgent, important, actionable, and real” but some fall short. Eg, quite a few currently exist that aren’t really actionable (eg, monitoring of various LITO services, Wardenclyffe -> PCP failures), that we have because we’d just rather know when something we depend on fails before our users.

Things to consider adding to Hound based on this:

dependency chain: link symptoms to causes so we can silence the symptom alerts when we know the cause
different alert targets. So we can set up alerts that only go to the people who can actually act on them, rather than dump everything to ccnmtl-sysadmin and train people to ignore a lot of them (“somebody else’s problem”).

anders @ 2014-10-14 09:00:56 +0000 UTC

«
6
»