Microservices at scale: A complexity management issue

by Joseph K. Clark

The benefits of microservices have been touted for years. Their popularity is evident when you consider the explosion in the use of technologies, such as Kubernetes, over the last few years. Based on the number of successful implementations, it seems that popularity is deserved. For example, according to a 2020 survey by O’Reilly, 92% of respondents reported some success with microservices, with 54% describing their experience as “mostly successful” and under 10% describing a “complete success.” 

But building and managing all of these smaller units containing code adds a lot of complexity to the equation, and it’s essential to get it right to achieve those successes. Developers can create as many of these microservices as they need, but it’s vital to have good management over those, especially as the number of microservices increases. According to Mike Tria, head of a platform at Atlassian, there are two schools of thought to managing a proliferation of microservices. One idea is to keep the number of microservices to a minimum so developers don’t have to consider scale and security.

“Every time they’re spinning up a new microservice, they try to keep them small,” Tria said. “That works fine for a limited number of use cases and specific domains because what will happen is those microservices will become large. As they say, you’ll end up with a distributed monolith.” According to Tria, the other option is to let developers spin up microservices whenever they want, which requires some additional considerations. Incorporating automation into the process is the key to ensuring this can be done successfully. 

Microservices

“If every time you’re building some new microservice, you have to think about all of those concerns about security, where you’re going to host it, what’s the IAM user and role that you need access to, what other services can it talk to—If developers need to figure all that stuff out every time, then you’re going to have a real scaling challenge. So the key is automating those capabilities away,makinge it such that you could spin up microservices without having to do all those things,” said Tria.

According to Tria, the main benefits of automation are scalability, reliability, and speed. Automation pallowsscale because new microservices can be created without burdening developers. Second, reliability is encapsulated in each microservice, which means the whole system becomes more reliable. Finally, skills and speed is gained because each team can build microservices at their own pace. 

Atlassian built its a tool for managing its microservices, but Tria recommends starting small with some off-the-shelf tools. This will enable you to get to know your microservices and figure out your needs, rather than predicting your needs and buying some expensive solution that might have features you don’t need or is missing parts you do. 

In addition to incorporating automation, there are several other ways to ensure success with scaling microservices. “It’s way too easy with microservices to overdo it right at the start,” Tria said. “Honestly, I think that’s the mistake more companies make getting started. They go too heavy on microservices, and right at the start, they throw too much on the compute layer, too much service mesh, Kubernetes, proxy, etc. People go too, too far. And so what happens is they get bogged down in the process, in bureaucracy, in too much configuration when people want to build features fast.”

Incorporate security 

Because of the nature of microservices, they tend to evoke additional security concerns, according to Tzury Bar Yochay, CTO and co-founder of the application security company Reblaze. Traditional software architectures use a castle-and-moat approach with limited ingress points, making securing the perimeter with a security solution possible. 

“Every microservice that can accept incoming connections from the outside world is potentially exposed to threats within the incoming traffic stream. Microservices, however, are each independent entities that are Internet-facing. It also has other security requirements (such as integrating with authentication and authorization services). These requirements are much more challenging than the ones typically faced by traditional applications,” said Bar Yochay. 

According to Bar Yochay, new and better approaches are constantly being invented to secure cloud-native architectures. Service meshes are an addition to microservices architectures that enable services to communicate with each other. In addition to added security, they offer benefits like load balancing, discovery, failure recovery, metrics, and more. For example, service meshes can build traffic filtering into the mesh itself and block hostile requests before the microservice receives them.

These advantages of service meshes will seem more remarkable when deployed across a more significant number of microservices. Still, smaller architectures can also benefit from them, according to Bar Yochay. 

Of course, the developers in charge of these microservices are also responsible for security, but there are a lot of challenges in their way. For example, developers and security teams can often be frictionless because developers want to add new features, while security wants to slow things down and be more cautious. “As more apps and services are being maintained, there are more opportunities for these cultural issues to arise,” Bar Yochay said. 

Bar Yochay recommends investing in developer-friendly security tools for microservices to alleviate the friction between developers and security. According to him, tany solutions today allow security to be built directly into containers or i-service meshes. In addition, security vendors are also advancing their use of technology, such as by applying machine learning to behavioral analysis and threat detection. 

Make sure your microservices don’t get too big.

“We’ve seen microservices turn into monolithic microservices, and you get kind of a microservice pretty quickly if you don’t keep and maintain it and keep on top of those things,” said Bob Quillin, chief ecosystem officer at Function. This company helps migrate applications to microservices architectures. 

“There is a lot of software where you’re not quite sure what it does,” said Quillin. “You and your team are maintaining it because it’s safer to keep it than to get rid of it. Dead code is one thing that can quickly lead to bigger microservices than they need to be. And that’s what I think eventually creates these larger and larger microservices that become almost like monoliths themselves.” 

Be clear about ownership.

Tria recommends that rather than having individuals own a microservice, it’s best to have a team own it. “Like in the equivalent of it takes a village, it takes a team to keep a microservice healthy, to upgrade it to make sure it’s checking in on its dependencies, on its rituals, around things like reliability and SLO. So I think the good practices have a team on it,” said Tria. 

For example, Atlassian has about 3,000 developers and roughly 1,400 microservices. Tria explained that assuming teams of five to 10 developers work out to every team owning two or three microservices, on average.

Don’t get too excited about the multilingual nature of microservices

One of the benefits of microservices—being polyglot—is also one of the downsides. According to Tria, one of Atlassian’s initial attractions to microservices was that they could be written using any language. 

“We had services written in Go, Kotlin, Java, Python, Scala; you name it. I’ve never heard of languages in whicht we had microservices written, which from an autonomy perspective,e and letting those teams run was great. Individual teams could all run off on their own and go and build their services,” said Tria.

However, this flexibility led to a language and service transferability problem across teams. In addition, microservices written in a particular language need developers familiar with that language to maintain them. Eventually, Tria’s team realized they needed to standardize to two or three languages.

Another recommendation Tria has based on his team’s experience is to understand the extent of how much the network can do for you. He recommends investing in things like service discovery early on. “[At the start] all our services found each other through DNS. You would reach another service through a domain name. What that did is it put a lot of pressure on our internal networking systems, specifically DNS,” said Tria. 

Figuring out a plan for microservices automation at Atlassian 

Mike Tria, head of the platform at Atlassian, is a proponent of incorporating automation into microservices management, but his team had to learn that the hard way. According to Tria, when Atlassian first started using microservices in early 2016, it had about 50 to 60 microservices total, and all ofthems were written on a Confluence page. They listed every microservice, who owned it, whether it had passed SOC2 compliance yet, and the on-call contact for that microservice.

“I remember at that time we had this long table, and we kept adding columns to the table, and the columns were things like when was the last time a performance test was run against it, or another column was what all the services that depend on it are? What are all the services it depends on? What reliability tier is it for uptime? Is it tier on,e where it needs very high uptime, or tier tw,o where it needs less? And we just kept expanding those columns.”

Once the table hit 100 columns, the team realized it wouldn’t be maintainable for long. Instead, they created a new project to take their capabilities in Confluence and turn them into a tool.

“The idea was we would have a system where when you build a microservice, the system essentially registers it into a central repository that we have,” said Tria. “That repository has a list of all of our services. It has the owners, the reliability, tiers, and anyone within the company can search and look up a surface. We made the tool pretty pluggable so that when we have new capabilities, we’re adding to our service.”

Related Posts