Large Data Processing

DAQ, ETL, Pipelines, Storage, Search, Correlation, Reporting, Maintenance

Data Acquisition

We able to acquire provide integration and development of data acquisition module for the following sources:

  • Devices
  • Cloud Storage
  • Databases – SQL, NoSQL
  • APIs

While supporting the formats that includes:

  • Structured
  • Unstructured
  • Text
  • Binary

The ETL follows data acquisition stage.  However the ETL itself can be regarded as a data acquisition procedure itself – albeit one that is internal and domain specific to your service and solution.

The “transform” stage requires proper modularization and abstraction to ensure updatability and flexibility of applying and swapping transformation module. We provide a design and implementation that will provide you a clean pluggable implementation.

The “load” stage has 2 design and implementation focus points:

  1. The transformed data integrity:  Northbound data quality and correctness depends on the integrity at this layer
  2. The choice of storage: Several factors such cost, data structure and especially the consuming client
An ETL of a base data may not be a simple interconnected single module of the extractor, transformer and loader;  but there could be multiple transformers from a combination of base data that is heading towards multiple loading storage points.  Efficiency in resource usage and performance has to be factored to ensure cost effectiveness and timeliness of the operation.
 
Our service provides the analysis of your operations and we will walk you through our design, the reason and motives behind it. We will then implement an ETL for your operation that you can rest assured will run optimally.

 

The ETL (or several ETLs) implementation may constitute a part of larger data pipeline.

A pipeline could be created for the purpose of a push-notifications for monitoring and alerting, part of a batch job workflow or triggered by a client request. Pipelines can also serve as the mechanism of sorting and aggregating massive amount of data.

The specific purpose of the pipelines drives the strategy for its implementation, each with specific areas of focus in the design.

We will analyze your operations and present our recommendation for optimal pipeline implementation and delivering a functional highly efficient data pipeline.

Data Security

There is nothing worse than dropping the ball on sensitive data while it is being worked on.  We have experience working on Cybersecurity projects in the past, we understand the and apply security measures to safeguard confidentiality and privacy of your sensitive data. Measures that include:

  • Secure coding practices (eg: validating requests, safeguarding SQL injection, well designed modules ensuring context of data usage is within the permissible scope)
  • Ensuring data leaks unsuspecting sources (logs, cruft dump files, test samples, …)
  • Utilization of RBAC, secret management and that access goes through the necessary security stack.

 

Other services