What is Hadoop?
Hadoop is free, open source frame work java based
programming that supports and processing the large data sets in distributed
computing environment .it is a part of Apache project and sponsored by Apache
software Foundation.
How it gives value to
your data?
With the help of Hadoop
you can give value to your data by big data technology. Big data means a huge
collection of data sets which usually work on banking sector, railway, ecommerce,
and social media, stock market, manufacturing etc.
Big data technologies are important in providing more
accurate analysis, which may lead to more concrete decision-making resulting in
greater operational efficiencies, cost reductions, and reduced risks for the
business.
To trapping the power of big data, you would require an
infrastructure that can manage and process huge volumes of structured and
unstructured data in real time and can protect data privacy and security.
There are various technologies in the market from different
vendors including Amazon, IBM, Microsoft, etc. to handle big data. While
looking into the technologies that handle big data are Analytics and operations.
With the help of NoSQL big data technology can design to
take the benefit of cloud computation that emerge all decades and run efficiently.
Analytics of big data Massively Parallel Processing (MPP) database systems and MapReduce that provide analytical
capabilities for backward-looking and complex analysis that may touch most or
all of the data. MapReduce provides
a new method of analysing data that is complementary to the capabilities
provided by SQL, and a system based on MapReduce
that can be scaled up from single servers to thousands of high and low end
machines.
Hadoop Basic
components
Hadoop Architecture
Hadoop framework
includes following four modules:
· Hadoop
Common: These
are Java libraries and utilities required by other Hadoop modules. These
libraries provide file system and OS level abstractions and contains the
necessary Java files and scripts required to start Hadoop.
· Hadoop
YARN: This is
a framework for job scheduling and cluster resource management.
· Hadoop
Distributed File System (HDFS): A
distributed file system that provides high-throughput access to application
data.
· Hadoop
MapReduce: This is
YARN-based system for parallel processing of large data sets.
No comments:
Post a Comment