A large networking company in Silicon Valley came up with a comprehensive large-scale cloud security auditing. The operations include the scanning, analyzing and report of security vulnerabilities and threats on every aspect of cloud-based deployment. The nature of the solution inherits complexity, scalable and distributed computing issues.
On top of the feature objectives, the system is required be inherently tightly secure – both in development and deployment concerns.
This case study documents issues faced by the customer and the work and role that Omar played to help make the project a success.
The problems left by previous team and issues discovered
The initial team that set out to developed the solution for the client did not meet any of the objectives.
The client was left with a system that did not execute as expected most of the time, and the time it took to complete the desired tasks was beyond the accepted threshold. And execution was ridden with runtime bugs.
Omar was called in to look into turning around the project. The initial assessment of the conditions includes:
Due to severity of how the existing system was constructed, it was clear that it does not have any path to forward. Omar had no other choice but to propose to the client that a new system be built from the ground up.
Turning around the project
A lengthy meeting with client was conducted to capture not just the technical specifications of functionalities, but also the fundamental objective and vision of the system.
Omar started working in modelling the solution to the problem, with the fundamental objective in mind and extensible framework allowing for continuous extensibility.
The system was sizable in nature and clear domain boundaries were identified and broken down to its respective microservices. These services were individually developed, and it was treated as an independent artifact, that can be deployed, executed and tested independently.
The microservices deployment were designed to be stateless and horizontal scalable. Worker nodes were designed to be dynamically scalable, taking on asynchronous jobs from message queue. System state from the execution of services API calls and work nodes activity are persisted and track by a scalable high-performance database.
Omar laid out the breakdown of packages, modules, activity flow of a service operation execution and between service-to-service communication. Coding style has and convention was communicated with the rest of the team members and enforced by strict review standards and automated checks. Staffs were given instructions on proper methods of error handling, exception raising and handling, logging conventions.
The new revamp version had to be completed within 6 months. Omar made a decision that priority to that functional tests be developed early, in parallel with development with the module and be integrated to the CIDD pipeline early.
While unit tests are important part of code integrity and quality, due to strict limited time for delivery, priority was given to functional tests automation for the initial launch. The project’s unit test was quickly all caught up after the initial roll out.
Omar ensured that CICD consideration were put in place at the onset. Deployment design and concerns were addressed early on. Separation of infrastructure component (message queue, databases, memory caching) and services pods. Pipeline for building software package bundles, deploying and building container images, deployment of pods to Kubernetes cluster.
Distributed system inherits the concern latency and bottlenecks. Profiling was done on the end-to-end deployment of the completed services. Omar spent some cycles identifying latency on critical and stressed segments of the system. Fault tolerance was incorporated and tested. Bottlenecks under high workloads were tested, identified and resolved.
The end results
Copyright © 2022 Vortex Innovation Labs - All Rights Reserved.