Screaming in the Cloud

By Corey Quinn

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store.


Screaming in the Cloud with Corey Quinn features conversations with domain experts in the world of Cloud Computing. Topics discussed include AWS, GCP, Azure, Oracle Cloud, and the "why" behind how businesses are coming to think about the Cloud.

Episode Date
Episode 15: Nagios was the Original Call of Duty
Let’s chat about the Cloud and everything in between. The people in this world are pretty comfortable with not running physical servers on their own, but trusting someone else to run them. Yet, people suffer from the psychological barrier of thinking they need to build, design, and run their own monitoring system. Fortunately, more companies are turning to Datadog. Today, we’re talking to Ilan Rabinovitch, Datadog’s vice president of product and community. He spends his days diving into container monitoring metrics, collaborating with Datadog’s open source community, and evangelizing observability best practices. Previously, Ilan led infrastructure and reliability engineering teams at various organizations, including Ooyala and He’s active in the open source and DevOps communities, where he is a co-organizer of events, such as SCALE and Texas Linux Fest. Some of the highlights of the show include: Datadog is well-known, especially because it is a frequent sponsor More organizations know their core competency is not monitoring or managing servers Monitoring/metrics is a big data problem; Datadog takes monitoring off your plate Alternate ways, other than using Nagios, to monitor instances and regenerate configurations Datadog is first to identify patterns when there is a widespread underlying infrastructure issue Trends of moving from on-premise to Cloud; serverless is on the horizon How trends affect evolution of Datadog; adjusting tools to monitor customers’ environments Datadog’s scope is enormous; the company tries to present relevant information as the scale of what it’s watching continues to grow Datadog’s pricing is straightforward and simple to understand; how much Cloud providers charge to use Datadog is less clear Single Pane of Glass: Too much data to gather in small areas (dashboards)   Why didn’t monitoring catch this? Alerts need to be actionable and relevant How to use Datadog’s workflow for setting alerts and work metrics Datadog’s first Dash user conference will be held in July in New York; addresses how to solve real business problems, how to scale/speed up your organization Links: Ilan Rabinovitch on Twitter Datadog Docker Adoption Survey Results   Rubric for Setting Alerts/Work Metrics Dash Conference re:Invent Nagios
Jun 20, 2018
Episode 14: Cheslocked and loaded
Do you need data captured that let you know when things don’t look quite right? Need to identify issues before they become major problems for your organization? Turn to Threat Stack, which has Cloud issues of its own, and helps its customers with their Cloud issues. Today, I’m talking to Pete Cheslock, who runs technical operations at Threat Stack, which handles security monitoring, alerting, and remediation. The company uses Amazon Web Services (AWS), but its customer base can run anywhere.   Some of the highlights of the show include: Challenges Threat Stack experienced with AWS and how it dealt with them Threat Stack helps companies improve their security posture in AWS Security shouldn’t be an issue, if providers do their job; shared responsibility Education is needed about what matters regarding security, avoiding mistakes Cloud is still so new; not many people have abroad experience managing it Scanning customer accounts against best practices to identify risks Threat Stack’s scanning tool is worthwhile, but most tools lack judgement and perspective Threat Stack offers context between host- and Cloud-based events; tying data together is the secret sauce You shouldn’t have to pay a bunch of money to have a robust security system Good operations is good security; update, patch, track, and perform other tasks Lack of validation about what services are going to be a successful or not Vendor Lock-in: Understand your choices when building your system Pervasiveness and challenge of containerization and Kubernetes Cloud reduces cycle time and effort to bring a product to market Amazon is a game changer with what it allows you to do and solve problems Links: Pete Cheslock Digital Ocean Threat Stack AWS re:Invent Kubernetes
Jun 13, 2018
Episode 13: Serverlessly Storing my Dad Jokes in a Dadabase
Aurora, from Amazon Web Services (AWS), is a MySQL-compatible service for complex database structures. It offers capabilities and opportunities. But with Aurora, you’re putting a lot of trust in AWS to “just work” in ways not traditional to relational database services (RDS). David Torgerson, Principal DevOps Engineer at Lucidchart, is a mystery wrapped in an enigma and virtually impossible to Google. He shares Lucidchart’s experience with migrating away from a traditional RDS to Aurora to free up developer time. Some of the highlights of the show include: Trade off of making someone else partially responsible for keeping your site up Lucidchart’s overall database costs decreased 25% after switching to Aurora Aurora unknowns: What is an I/Op in Aurora? When you write one piece of data, does it count as six I/Ops? Multi-master Aurora is coming for failover time and disaster recovery purposes Aurora drawbacks: No dedicated DevOps, increased failover time, and misleading performance speed Providers offer ways to simplify your business processes, but not ways to get out of using their products due to vendor and platform lock-in Lucidchart is skeptical about Aurora Serverless; will use or not depending on performance Links: Corey's architecture diagram on AWS Lucidchart Lucidchart’s Data Migration to Amazon Aurora Preview of Amazon Aurora Multi-master Sign Up This is My Architecture re:Invent Digital Ocean
Jun 06, 2018
Episode 12: Like Normal Cloud Services, but More Depressing
Does your job challenge and motivate you? Does it utilize your skills? Or, are you ready to go job hunting? Do you want an awesome job that is a resume booster? Companies should be supportive of their employees finding a job that matches their skills and interests. Also, when hiring, companies should offer thoughtful processes for interviews.   Today, I’m talking to Sarah Withee, a polyglot software engineer, mentor, teacher, and robot tinkerer. Sarah went job hunting, and after several job interviews, she finally found a job that made her super happy at Arcadia Healthcare Solutions. Sarah compares the interview processes she experienced at big name tech companies that offer Cloud services. Some of the highlights of the show include: Companies sometimes lose sight that even interview interactions need to be a two-way sale Interviews often involve talking to many people; and if several are bad, that forms a negative impression of the company Companies need to provide interview training and follow the same standards Don’t farm out challenging or unfamiliar issues when interviewing candidates Sarah is very competent, but she is new to Cloud platforms; she is like a sponge, who enjoys learning and having a bare knowledge of new technology How HIPAA regulations impact Sarah’s learning and software engineering work; she has to be more aware of security and safety of healthcare data Being a teacher and mentor affects how Sarah learns new things; everybody learns slightly differently In the Cloud space, know which direction you want to go and start with simpler things to learn the basics; focus on what is relevant to what you are working on Links: Sarah Withee on Twitter #speakerconfessions Sarah Withee on Twitter Sarah Withee Blog Sarah Withee Resume Digital Ocean AWS Azure
May 30, 2018
Episode 11: Hickory Dickory Docker
Docker went from being a small startup to an enterprise company that changed the way people think about their infrastructure to now, where its relevance is somewhat minimal. The conversation is no longer around the container level. Docker has become commonplace. Today, we’re talking to Jérôme Petazzoni, formerly of Docker. While he was with the company for about 8 years, Docker definitely experienced a roller coaster ride.   Some of the highlights of the show include: Amount of work conducted on the enterprise vs. community editions Docker was so widely adopted because its core technology was open source Challenge is to build a viable business and revenue model for the long run Similarities between Docker and Red Hat open source platforms Docker went from six people working in a garage to having a few hundred employees and $1.3 billion valuation Changes happened, but they were gradual; the changes were necessary to be a profitable and sustainable company Contingent of internal and external people believed that Docker was the answer for whatever problem surfaced; Docker would save you, but not always Balancing Act: Pushing forward with a correct message and regulating enthusiasm Networking and Docker for dummies; confusion and problems of things not working as expected have been resolved Things will continue to shift; Kubernetes and the orchestration battle What was unthinkable, could happen by companies pushing the envelope and making progress Will who you have as your Cloud provider stop mattering? It depends. All major Cloud providers plan to offer managed Kubernetes services and what Jérôme thinks of them Jérôme’s opinion on whether Kubernetes will follow this same path as Docker What does the road ahead look like for infrastructure automation? There is potential and lots of best practices in Cloud environments. Links: Jérôme Petazzoni on Twitter Docker Crunch Base Digital Ocean Red Hat Corey's Heresy in the church of docker talk Kubernetes ZooKeeper Azure
May 23, 2018
Episode 10: Education is Not Ready for Teacherless
Like migrating caribou, you tend to follow the trends of what clients are doing, which dictates what you work on as a consultant. Today, we’re talking to Lynn Langit, an independent Cloud architect. She is an AWS Community Hero, Google Cloud developer expert, and former Microsoft MVP. Lynn is a lifelong learner, and she has worked broad and deep across all three large providers. These days, she works mostly with Google Cloud and AWS, rather than Azure, because that’s what her clients are using. Some of the highlights of the show include: Differences between the West Coast and global use of Cloud Education is key; Lynn is th co-founder of Lynn helped create curriculum and resources for school-age children; even her young daughter taught classes on how to code Training for teachers was also needed, so TKP Labs was formed to offer fee-based teacher and developer training Lynn started with classroom training, but has transitioned to online learning Lynn is focusing on Big Data projects and using tools to solve real-world problems Pre-processing and batching data, but not streaming it AWS, Azure, and Google Cloud are all coming out with Big Data-oriented tools Companies need to understand when the market is ready to accept a new paradigm; in the data world, change is more slow than in the programming world If you touch a database and get burned, you are not willing to use it again; or you may have never tried to archive your data; hire a consultant to help you Machine learning APIs give customers value quickly; review them before building custom models Migrating data can be a costly project and restricts where the data lives As Cloud proliferates, how will that impact technical education? Lynn’s Cloud for College Students to the rescue! Shift from interactive to unidirectional, one-to-many learning styles; the Cloud is ready for serverless, but education is not ready for teacherless Road that many of us walked to get to technical skills no longer exists; how to become a modern technologist Ageism: By age 40, you are considered a manager or useless; don’t be afraid to learn something new Links: Digital Ocean AWS Community Hero Microsoft Azure Digigirlz TKP Labs Lynn Langit on Commonwealth Scientific and Industrial Research Organisation Google BigQuery Amazon Athena AWS Glue Cloud Dataflow Cloud Dataprep Lambda Amazon EC2 Learn Python the Hard Way
May 16, 2018
Episode 9: Cloud Coreyography
Microsoft has experienced a renaissance. By everything that we've seen coming out of Microsoft over the past few years, it feels like the company is really walking the walk. Instead of just talking about how it’s innovative, it’s demonstrating that. Microsoft has been on an amazing journey, making the progression from telling customers what they need to listening to them and responding by building what they ask for. Today, we’re talking to Corey Sanders, Corporate Vice President of Azure Compute at Microsoft. Some of the highlights of the show include: Customers are asking for Microsoft to help them through support and enabling platforms Storytelling efforts through advocates, who play a double role – engaging and defending Microsoft Customers moving to the Cloud are focused on a continuum and progression; they have stuff to move from one location to another and want all the benefits–better agility, faster startup time, etc. Virtual serial console into existing VMs; this is how people are using this and Microsoft is going to, if not encourage this behavior, at least support it Microsoft is the only Cloud with a single-instance SLA Serial consoles: Windows' has seen less usage, partly due to operational aspects of Windows vs. Linux. It's not a GUI; it's scripting. Does the operating system matter? From a Cloud perspective, it shouldn't have to matter; you should be able to deploy it the way you want Edge enables much more complex and segregated scenarios; that combination with cognitive searches running locally will make it accessible anywhere Branding challenge as customers start to notice that devices are smarter and more complex; will they lose awareness that Microsoft Azure is powering most of these things - they shouldn’t care An awareness of not just what's possible, but what's coming; the democratization of AI Education and fear gap of trying something new and taking that first step; make products and services stupid and simple to use Customers return to add cognitive services and AI capabilities to existing, running deployments, environments, and applications Multi-Cloud solutions can be successful, but there's a caveat; they’re actually built on a service-by-service perspective Azure Stack, offers consistency, but some people may place blame on it for poor data center management practices; some expectations and regulations may be frustrating to some customers, but lets Microsoft offer a consistent experience Freedom and flexibility have been challenges for Microsoft and other products for private Clouds What people need to understand about Azure, including from a durability and reliability experience To some extent, scale becomes a necessary prerequisite for some applications Microsoft has taken many steps and is the leader in various areas Links: ReactiveOps Microsoft Azure Corey Sanders on Twitter The Robot Uprising Will Have Very Clean Floors Kubernetes Cassandra Azure Stack
May 09, 2018
Episode 8: A Corporate Prisoner's Dilemma
Have you dabbled with IT infrastructure in AWS? Have you been through the process of AWS partnership? Does being an AWS partner add value? Amazon seeks partners that helps drive its business, goals, and value. Today, we’re talking to Justin Brodley, the vice president of Cloud engineering at Ellie Mae. He has been through the AWS partnership process and shares his thoughts about it. He encourages you to find the right partner for your business! Some of the highlights of the show include: Different levels and types of AWS partnerships Shakedown vs. opportunity method for new leads; lead generation expectations Amazon’s improvements eroding business models Partners trying to pivot, but not exclusive to AWS Whether to invest in multi-Cloud Amazon can’t scale its sales team to handle everybody; views partner program as an extension of its salesforce Your company is important and you’re spending a lot of money, but Amazon may not care about you; partner market fills that gap and makes you feel important Corporate prisoner’s dilemma: Your tech company offers something that Amazon doesn’t; but what about when Amazon does offer it? Competitors’ horizontal move to become more diversified Amazon expects partners to offer products and services that it cannot offer yet If partners fail, Amazon decides to do it and do it better Is Amazon’s best interest geared toward its partners or you and your customers? Amazon needs to give incentives and support partners Links: Justin Brodley on Twitter Brodley Group Ellie Mae Digital Ocean AWS Partner Network Lambda API Gateway AWS re:Invent Salesforce Azure Rackspace
May 02, 2018
Episode 7: The Exact Opposite of a Job Creator
Monitoring in the entire technical world is terrible and continues to be a giant, confusing mess. How do you monitor? Are you monitoring things the wrong way? Why not hire a monitoring consultant!          Today, we’re talking to monitoring consultant Mike Julian, who is the editor of the Monitoring Weekly newsletter and author of O’Reilly’s Practical Monitoring. He is the voice of monitoring. Some of the highlights of the show include: Observability comes from control theory and monitoring is for what we can anticipate Industry’s lack of interest and focus on monitoring When there’s an outage, why doesn’t monitoring catch it?” Unforeseen things. Cost and failure of running tools and systems that are obtuse to monitor Outsource monitoring instead of devoting time, energy, and personnel to it Outsourcing infrastructure means you give up some control; how you monitor and manage systems changes when on the Cloud CloudWatch: Where metrics go to die Distributed and Implemented Tracing: Tracing calls as they move through a system Serverless Functions: Difficulties experienced and techniques to use Warm vs. Cold Start: If a container isn't up and running, it has to set up database connections Monitoring can't fix a bad architecture; it can't fix anything; improve the application architecture Visibility of outages and pain perceived; different services have different availability levels Links: Mike Julian Monitoring Weekly Copy Construct on Twitter Baron Schwartz on Twitter Charity Majors on Twitter Redis Kubernetes Nagios Datadog New Relic Sumo Logic Prometheus Honeycomb Honeycomb Blog CloudWatch Zipkin X-Ray Lambda DynamoDB Pinboard Slack Digital Ocean
Apr 25, 2018
Episode 6: The Robot Uprising Will Have Very Clean Floors
How many of you are considered heroes? Specifically, in the serverless Cloud, Twitter, and Amazon Web Services (AWS) communities? Well, Ben Kehoe is a hero. Ben is a Cloud robotics research scientist who makes serverless Roombas at iRobot. He was named an AWS Community Hero for his contributions that help expand the understanding, expertise, and engagement of people using AWS. Some of the highlights of the show include: Ben’s path to becoming a vacuum salesman History of Roomba and how AWS helps deliver current features Roombas use AWS Internet of Things (IoT) for communication between the Cloud and robot Boston is shaping up to be the birthplace of the robot overlords of the future AWS IoT is serverless and features a number of pieces in one service Robot rising of clean floors AWS Greengrass, which deploys runtimes and manages connections for communication, should not be ignored Creating robots that will make money and work well Roomba’s autonomy to serve the customer and meet expectations Robots with Cloud and network connections Competitive Cloud providers were available, but AWS was the clear winner Serverless approach and advantages for the intelligent vacuum cleaner Future use of higher-level machine learning tools Common concern of lock-in with AWS Changing landscape of data governance and multi-Cloud Preparing for migrations that don’t happen or change the world Data gravity and saving vs. spending money Links: Ben Kehoe on YouTube AWS AWS Community Hero AWS IoT Ben Kehoe on Twitter iRobot AWS Greengrass Shark Cat Medium Boston Dynamics AWS Lambda AWS SageMaker AWS Kinesis Google Cloud Platform Spanner Kubernetes Digital Ocean
Apr 18, 2018
Episode 5: The Last Mainframe with a Kickstart and a Double Clutch
How are companies evolving in a world where Cloud is on the rise? Where Cloud providers are bought out and absorbed into other companies? Today, we’re talking to Nell Shamrell-Harrington about Cloud infrastructure. She is a senior software engineer at Chef, CTO at Operation Code, and core maintainer of the the Habitat open source product. Nell has traveled the world to talk about Chef, Ruby, Rails, Rust, DevOps, and Regular Expressions. Some of the highlights of the show include: Chef is a configuration management tool that handles instance, files, virtual machine container, and other items. Immutable infrastructure has emerged as the best of practice approach. Chef is moving into next gen through various projects, including one called, Compliance - a scanning tool. Some people don’t trust virtualization. Habitat is an open source project featuring software that allows you to use a universal packaging format. Habitat is a run-time, so when you run a package on multiple virtual machines, they form a supervisor ring to communicate via leader/follower roles. Deploying an application depends on several factors, including application and infrastructure needs. It is possible to convert old systems with old deployment models to Habitat. Habitat allows you to lift a legacy application and put it into that modern infrastructure without needing to rewrite the application. You can ease in packages to Habitat, and then have Habitat manage pieces of the application. Habitat is Cloud-agnostic and integrates with public and private Cloud providers by exporting an application as a container. Chef is one of just a few third-party offerings marketed directly by AWS. From inception to deployment, there is a place for large Cloud providers to parlay into language they already speak. Operation Code is a non-profit that teaches software engineer skills to veterans. It helps veterans transition into high-paying engineering jobs. The technology landscape is ever changing. What skills are most marketable?   Operation Code is a learning by experience type of organization and usually starts people on the front-end to immediately see results. Links: Nell Shamrell-Harrington Nell Shamrell-Harrington on Twitter Nell Shamrell-Harrington on GitHub Operation Code Chef Ruby on Rails Rust Regular Expressions Habitat AWS Kubernetes Docker LinkedIn Learning GorillaStack (use discount code: screaming)
Apr 11, 2018
Episode 4: It's a Data Lake, not a Data Public Swimming Pool
Open source activism tends to focus on running on hardware you can trust and avoiding Cloud computing. The problem with some Cloud providers has to do with a conflict of interest between serving customers and how they generate revenue. It’s important for the customer to have control of their computer and their data in the Cloud. But what about their security and privacy?Today, we’re talking to Kyle Rankin, chief security officer at Purism and writer for Linux Journal. He is a Linux expert who decided to work at Purism because of the company’s belief in free software and the Linux community.Some of the highlights of the show include: Cloud providers have faced challenges when it comes to data privacy and who owns what. The word “Cloud” is overloaded, and it is unclear who is in control. Cloud providers can sabotage efforts to make programs work together. Cloud providers may not troll through data and exploit it. Yet, they develop tools for customers to be able to do that.   Even though Linux Journal stopped being printed and went digital, and was going under, it’s now back and taking a new approach. What matters to new readers and Linux users is now different than what was important to original readers. The more time you can spend to understand what’s happening behind the scenes will make you much more marketable and adaptable. Kyle explains whether Amazon Linux is becoming a viable concern and if distribution matters anymore. Now, it’s about running an application, not thinking about what it’s running on. Are there gangs of Cloud users? Do people look down on Azure users? The target is always moving and changing.   Check out Kyle’s book, Linux Hardening in Hostile Networks: Server Security from TLS to Tor. Links: Kyle Rankin on Twitter Purism Kyle Rankin’s book - Linux Hardening in Hostile Networks: Server Security from TLS to Tor Linux Journal 2.0 FAQ GorillaStack (use “screaming” for discount)
Apr 04, 2018
Episode 3: Turning Off Someone Else's Site as a Service
How do you encourage businesses to pick Google Cloud over Amazon and other providers? How do you advocate for selecting Google Cloud to be successful on that platform? Google Cloud is not just a toy with fun features, but is a a capable Cloud service. Today, we’re talking to Seth Vargo, a Senior Staff Developer Advocate at Google. Previously, he worked at HashiCorp in a similar advocacy role and worked very closely with Terraform, Vault, Consul, Nomad, and other tools. He left HashiCorp to join Google Cloud and talk about those tools and his experiences with Chef and Puppet, as well as communities surrounding them. He wants to share with you how to use these tools to integrate with Google Cloud and help drive product direction. Some of the highlights of the show include: Strengths related to Google Cloud include its billing aspect. You can work on Cloud bills and terminate all billable resources. The button you click in the user interface to disable billing across an entire project and delete all billable resources has an API. You can build a chat bot or script, too. It presents anything you’ve done in the Consul by clicking and pointing, as well as gives you what that looks like in code form. You can expose that from other people’s accounts because turning off someone else’s Website as a service can be beneficial. You can invite anyone with a Google account, not just ‘’ but ‘@’ any domain and give them admin or editor permissions across a project. They’re effectively part of your organization within the scope of that project. For example, this feature is useful for training or if a consultant needs to see all of your different clients in one dashboard, but your clients can’t see each other. Google is a household name. However, it’s important to recognize that advocacy is not just external advocacy, there’s an internal component to it. There’s many parts of Google and many features of Google Cloud that people aren’t aware of. As an advocate, Seth’s job is to help people win. Besides showing people how they can be successful on Google Cloud, Seth focuses on strategic complaining. He is deeply ingrained in several DevOps and configuration management communities, which provide him with positive and negative feedback. It’s his job to take that feedback and convert it into meaningful action items for product teams to prioritize and put on roadmaps. Then, the voice of the communities are echoed in the features and products being internally developed. Amazon has been in the Cloud business for a long time. What took Google so long? For a long time, Google was perceived as being late to the party and not able to offer as comprehensive and experienced services as Amazon. Now, people view Google Cloud as not being substandard, but not where serious business happens. It’s a fully feature platform and it comes down to preferences and pre-existing features, not capability. Small and mid-size companies typically pick a Cloud provider and stick with their choice. Larger companies and enterprises, such as Fortune 50 and Fortune 500 companies, pick multiple Clouds. This is usually due to some type of legal compliance issues, or there are Cloud providers that have specific features. Externally at Google, there is the Deployment Manager tool at It’s the equivalent of CloudFormation, and teams at Google are staffed full time to perform engineering work on it. Every API that you get by clicking a button on are viewing the API Docs accessible via the Deployment Manager. Google Cloud also partners with open source tools and corresponding companies. There are people at Google who are paid by Google who work full time on open source tools, like Terraform, Chef, and Puppet. This allows you to provision Google Cloud resources using the tools that you prefer. According to Seth, there’s five key pillars of DevOps: 1) Reduce organizational silos and break down barriers between teams; 2) Accept failures; 3) Implement gradual change; 4) Tooling and automation; and 5) Measure everything. Think of DevOps as an interface in programming language, like Java, or a type of language where it doesn’t actually define what you do, but gives you a high level of what the function is supposed to implement. With the SRE discipline, there’s a prescribed way for performing those five pillars of DevOps. Specific tools and technologies used within Google, some of which are exposed publicly as part of Google Cloud, enable the kind of DevOps culture and DevOps mindset that occur. A reason why Google offers abstract classes in programming is that there’s more than one way to solve a problem, and SRE is just one of those ways. It’s the way that has worked best for Google, and it has worked best for a number of customers that Google is working with. But there are some other ways, too. Google supports those ways and recognizes that there isn’t just one path to operational success, but many ways to reach that prosperity. The book, Site Reliability Engineering, describes how Google does SRE, which tried to be evangelized with the world because it can help people improve  operations. The flip side of that is that organizations need to be cognizant of their own requirements. Google has always held up along several other companies as a shining beacon of how infrastructure management could be. But some say there’s still problems with its infrastructure, even after 20-some years and billions invested. Every company has problems, some of them technical, some cultural. Google is no exception. The one key difference is the way Google handles issues from a cultural perspective. It focuses on fixing the problem and making sure it doesn’t happen again. There’s a very blameless culture. Conferences tend to include a lot of hand waving and storytelling. But as an industry, more war stories need to be told instead of pleasure stories. Conference organizers want to see sunshine and rainbows because that sells tickets and makes people happy. The systemic problem is how to talk about problems out in the open. Becoming frustrated and trying to figure out why computers do certain things is a key component of the SRE discipline referred to as Toil -  work tied to systems that either we don’t understand or don’t make sense to automate. Those going to Google Cloud to ‘move and improve’ tend to be a mix of those from other Cloud providers and those from on-premise data center deployments. Move and improve is where there are VMs in a data center, and they need to be moved to the Cloud. There are tiny differences around the Cloud-native paradigm and providers. There’s some key pillars: Does it handle restarts well? Is it highly available? Can it be containerized, even though containers aren’t necessarily required for Cloud native? Does it package all of its dependencies with it? Can it run on different operating systems? All of these things are generic, they’re not specific to a Cloud provider. Links: Google Cloud and blog Amazon Web Services HashiCorp Terraform Vault Consul Nomad Chef Puppet Kubernetes AutoML Monitorama Azure CloudFormation Ansible Elk Stack Site Reliability Engineering book for O’Reilly Fastly Hacker News Cloud Foundry Microsoft Cloud Alibaba Cloud Lambda Quotes by Seth: “Everything we do on Google Cloud is API First. Anytime you click a button in that Web UI, there is a corresponding API call, which means you can build automation, compliance, and testing around these various aspects.” “The IAM and permission management in Google Cloud is incredibly powerful. It leverages the same IAM permissions that G Suite has which is hosted Gmail, Calendar, and all of those other things.” “How do I get people who want to use Google Cloud or don’t know about Google Cloud? The ability to be successful on the platform.” “I would definitely say that any company you work at, whether the recruiter tells you that it’s all sunshine and rainbows and there’s nothing ever wrong is a lie.”
Mar 28, 2018
Episode 2: Shoving a SAN into us-east-1
When companies migrate to the Cloud, they are literally changing how they do everything in their IT department. If lots of customers exclusively rely on a service, like us-east-1, then they are directly impacted by outages. There is safety in a herd and in numbers because everybody sits there, down and out. But, you don’t engineer your application to be a little more less than a single point of failure. It’s a bad idea to use a sole backing service for something, and it’s unacceptable from a business perspective. Today, we’re talking to Chris Short from the Cloud and DevOps space. Recently, he was recognized for his DevOps’ish newsletter and won the People’s Choice Award for his DevOps writing. He’s been blogging for years and writing about things that he does every day, such as tutorials, codes, and methods. Now, Chris, along with Jason Hibbets, run the DevOps team for Some of the highlights of the show include: Chris’ writing makes difficult topics understandable. He is frank and provides broad information. However, he admits when he is not sure about something. SJ Technologies aims to help companies embrace a DevOps philosophy, while adapting their operations to a Cloud-native world. Companies want to take advantage of philosophies and tooling around being Cloud native. Many companies consider a Cloud migration because they’ve got data centers across the globe. It’s active-passive backup with two data centers that are treated differently and cannot switch to easily. Some companies do a Cloud migration to refactor and save money. A Cloud migration can result in you having to shove your SAN into the USC1. It can become a hybrid workflow. Lift and shift is often considered the first legitimate step toward moving to the Cloud. However, know as much as you can about your applications and RAM and CPU allowances. Look at density when you’re lifting and shifting. Know how your applications work and work together. Simplify a migration by knowing what size and instances to use and what monitoring to have in place. Some do not support being on the Cloud due to a lack of understanding of business practices and how they are applied. But, most are no longer skeptical about moving to the Cloud. Now, instead of ‘why cloud,’ it becomes ‘why not.’ Don’t jump without looking. Planning phases are important, but there will be unknowns that you will have to face. Downtime does cost money. Customers will go to other sites. They can find what they want and need somewhere else. There’s no longer a sole source of anything. The DevOps journey is never finished, and you’re never done migrating. Embrace changes yourself to help organizations change. Links: Chris Short on Twitter DevOps'ish SJ Technologies Amazon Web Services Cloud Native Infrastructure Oracle OpenShift Puppet Kubernetes Simon Wardley Rackspace The Mythical Man-Month Atlassian BuzzFeed Quotes by Chris: “Let’s not say that they’re going whole hog Cloud Native or whole hog cloud for that matter but they wanna utilize some things.” “They can never switch from one to the other very easily, but they want to be able to do that in the Cloud and you end up biting off a lot more than you can chew…” “Create them in AWS. Go. They gladly slurp in all your VM where instances you can create a mapping of this sized thing to that sized thing and off you go. But it’s a good strategy to just get there.” “We have to get better as technologists in making changes and helping people embrace change.”
Mar 21, 2018
Episode 1: Feature Flags with Heidi Waterhouse of LaunchDarkly
This podcast features people doing interesting work in the world of Cloud. What is the state of the technical world? Let’s first focus on the up or down, on or off function of feature flags. Today, we’re talking to Heidi Waterhouse, a technical writer turned Developer Advocate at LaunchDarkly, which is a feature flag service - a way to wrap a snippet of code around your feature and make it into an instrument to turn on or off. It lets you turn things on and off in your codebase quickly without having to do several commits. However, it is difficult to track it when there are more than about a dozen flags. So, LaunchDarkly provides a way to manage your features at scale with a usable interface and API. Some of the highlights of the show include: A feature flag allows you to hide items before you want them to go live on your Website. You hide it behind a feature flag, doing all the work ahead of time. Then, at some point, you turn it all on instantly without the risk of pushing untested code into your production. You can test at scale to gain authentic data. Test something with your team, your company’s employees, your customers, etc. However, no matter how good your integration tests are, there’s always wobbles to watch for in the system. With implementation, there are a few paths that can work, such as the massive reorganization path. Or, you can just start incrementally with feature flags for new features. LaunchDarkly thinks in the Cloud as the surface because it mostly works with people who are doing Web-based delivery of features. Major companies, like Google and Facebook, offer services similar to feature flags for their own development. They’re operating on such a giant scale that they have internal teams doing it. Companies use feature flags on the front-end and other purposes. It works through the whole stack from frontend page delivery, pricing tiers, white labeling, style sheets, to safer deployments. Do not focus on documentation. You should not have to read documentation for anything that you don’t own. Every feature should have documentation tied to its code. Create a customized experience. Feature flags effectively manage and minimize risk. There is always risk in the world, but what causes disaster is not just one failure. It is a multiplication of failures. This goes wrong and that goes wrong. Feature flagging breaks monolithic releases into tiny chunks that can go forward or backward. LaunchDarkly holds monthly meet-ups called, Test and Production. People share their use case regarding continuous integration, continuous deployment, DevOps, etc. Links: LaunchDarkly iPad Autodesk Slack IBM Quotes by Heidi: “What feature flags do is make it possible for you to push out a deployment with things hidden, we call it launching darkly.” “We’re all about avoiding risk, I think this is our motto this year, eliminate risk…you can’t eliminate risk, but you can make it much less risky.” “Go ahead and write your feature. You know that it’s hidden behind the magical feature flying curtain until you’re ready to turn it on.” “If 20 years of technical writing taught me anything, it’s that nobody wants to be reading documentation.”  
Mar 19, 2018