Today datasets grow so fast and become so big that classic data management tools do not work anymore. Difficulties include capture, storage, search, sharing, analytics, and visualizing. Couple of years back typical data management solution was dealing with a few dozen terabytes, now it is petabytes and more. Another challange is so called inbounded data or streaming data that requires completely different technology to efficiently handle it with upfront defined SLAs.
Alltogether this require us to change mindset both in underlying technology and solution architecture. The new age applications are utilizing things, like Apache Hadoop, Spark, Kafka, etc. Such applications are built in a completely different manner. They are de-centralized, designed for elastic scale, use mix of storage technologies, utilize eventual consistency concept, rely on parallel and asynchronous processing, designed for failures, automated self-management, IaaS independent. In other words they are true cloud applications.
Our team focuses on identifying, designing and implementing solutions with measurable business value. As a Cloudera partner we have imlemented a number of projects on top of CDH.
We have built an effective network metric processing and monitoring solution for one of our customers - a big telecom company. The solution was utilyzing HDFS, HBase and Hive technologies to handle billion of records on daily basis.