Scalable microservice

cloud auditing system

Background

A large networking company in Silicon Valley came up with a comprehensive large-scale cloud security auditing. The operations include the scanning, analyzing and report of security vulnerabilities and threats on every aspect of cloud-based deployment. The nature of the solution inherits complexity, scalable and distributed computing issues. 

On top of the feature objectives, the system is required be inherently tightly secure – both in development and deployment concerns. 

This case study documents issues faced by the customer and the work and role that Omar played to help make the project a success

The problems left by previous team and issues discovered 

The initial team that set out to developed the solution for the client did not meet any of the objectives.

The client was left with a system that did not execute as expected most of the time, and the time it took to complete the desired tasks was beyond the accepted threshold. And execution was ridden with runtime bugs.

Omar was called in to look into turning around the project. The initial assessment of the conditions includes:

  • Poor coding style. Code was written in Python, however the initial lead developers most likely came from a different background – all signs to points to Java. This is common coding debacle that occurs when a developer does not empathize with the philosophy and principles of a programming language and coding from a mindset of another.
  • Security loop holes and vulnerabilities in the implementation due to improper coding styles and improper secret management
  • Poor strategy and lack of coherence in distribution pipeline
  • Lack of modularity and separation concerns, resulting in duplicated and bug conducive code.
  • Due to lack and almost absent of coherent architectural design, it is was very difficult to modify, refactor, extend and enhance the implementation

The recommendation 

Due to severity of how the existing system was constructed, it was clear that it does not have any path to forward. Omar had no other choice but to propose to the client that a new system be built from the ground up.

Turning around the project 

A lengthy meeting with client was conducted to capture not just the technical specifications of functionalities, but also the fundamental objective and vision of the system. 

Omar started working in modelling the solution to the problem, with the fundamental objective in mind and extensible framework allowing for continuous extensibility.  

  • New architecture – modularity and microservice 

The system was sizable in nature and clear domain boundaries were identified and broken down to its respective microservices. These services were individually developed, and it was treated as an independent artifact, that can be deployed, executed and tested independently. 

  • Horizontal scalability of service and worker nodes 

The microservices deployment were designed to be stateless and horizontal scalable. Worker nodes were designed to be dynamically scalable, taking on asynchronous jobs from message queue.  System state from the execution of services API calls and work nodes activity are persisted and track by a scalable high-performance database. 

  • Modularization, code style and standard 

Omar laid out the breakdown of packages, modules, activity flow of a service operation execution and between service-to-service communication. Coding style has and convention was communicated with the rest of the team members and enforced by strict review standards and automated checks. Staffs were given instructions on proper methods of error handling, exception raising and handling, logging conventions.

  • Functional Tests 

The new revamp version had to be completed within 6 months.  Omar made a decision that priority to that functional tests be developed early, in parallel with development with the module and be integrated to the CIDD pipeline early. 

While unit tests are important part of code integrity and quality, due to strict limited time for delivery, priority was given to functional tests automation for the initial launch. The project’s unit test was quickly all caught up after the initial roll out. 

  • CICD 

Omar ensured that CICD consideration were put in place at the onset. Deployment design and concerns were addressed early on.  Separation of infrastructure component (message queue, databases, memory caching) and services pods. Pipeline for building software package bundles, deploying and building container images, deployment of pods to Kubernetes cluster. 

  • System profiling and performance tuning 

Distributed system inherits the concern latency and bottlenecks.  Profiling was done on the end-to-end deployment of the completed services.  Omar spent some cycles identifying latency on critical and stressed segments of the system.  Fault tolerance was incorporated and tested.  Bottlenecks under high workloads were tested, identified and resolved. 

The end results 

 

  • 300% improvement in performance under the same cloud resource budget 
  • Dynamic scalability of the system allows to client adjust deployment according to budget and business needs. 
  • Extensible framework that allows new use cases to be supported with little development time and complexity 
  • CICD process with integrated functional tests that provides confidence on every code deployment. 
  • Secure code and deployment artifacts. 
  • A lasting culture and methodology on the team members enabling the project to evolve as a center stage solution within the organization. 

Explore other Case Studies