I'm sharing you my experience with CDH.(it is jpurely a personal recommendation)
CDH source code is basically from the apache svn itself,but not mirrored to apache releases.
A CDH release would be corresponding to a certain/latest release from apache with a good number
of patches on top. Majority of these patches would be available in hadoop svn but may be not
part of the current Apache Hadoop release.
The major advantages I saw with CDH are
- Cloudera provides a tool SCM that would kind of automatically set up a hadoop cluster for
you
- Cloudera bundles the hadoop related projects which is pretty ease to install on any standard
linux boxes()
- Cloudera ensures that the CDH release and the available hadoop projects for the release
are compatible(for example you don't have to take the hassle on finding the compatible
hbase release with your hadoop release and integration between related projects etc)
- There are a good number of large enterprises using CDH with cloudera support.(Cloudera provides
various support packages)
- Since a large enterprises are dependent on CDH, it in turn speaks how well CDH is tested
and if a bug arises how large would be the impact. (In short CDH is well tested)
- Under Cloudera support you get help and suggestions from Cloudera hadoop expert engineers
in fine tuning your hadoop platform, tools application etc.
- When you go in with some end to end enterprise solutions with hadoop, you can even get advises
on best practices in your code level as well from them.(You do get the same from hadoop user
groups as well but here there is a dedicated timeline based commitment when you are a customer
of Cloudera)
- If you don't have the best hadoop resources in store, you may find tough times in handling
failures on your cluster , fine tuning your cluster, updating your cluster, optimizing your
applications etc. Cloudera guys would throw light almost all critical issues and helps in
getting resolved under stringent SLAs.
These points never says Apache Releases not so great. It is definitely the best and back bone
of hadoop. It is well tested as well. But when it comes nonavailability of expert hadoop resources
in house, you can face lot of unexpected hurdles which you may need to handle in time bound
manner and there you need to have hadoop consultants.
Definitely you'd get more valid points directly from the Cloudera engineers.(Some official
comments)
Hope it helps!..
来源:http://mail-archives.apache.org/mod_mbox/hive-user/201111.mbox/%3C1320337854.22398.YahooMailNeo@web121217.mail.ne1.yahoo.com%3E
分享到:
相关推荐
Spring Data for Apache Hadoop API。 Spring Data for Apache Hadoop 开发文档
SQL for Apache Hadoop, SQL for Apache Hadoop, SQL for Apache Hadoop, SQL for Apache Hadoop
Pro apache hadoop 原版书
The book begins with an overview of big data and Apache Hadoop. Then, you will set up a pseudo Hadoop development environment and a multi-node enterprise Hadoop cluster. You will see how the parallel ...
带图带说明:Hadoop简介及Apache Hadoop三种搭建方式
Apache Hadoop YARN.pdf完整电子版
Hadoop at Cloudera: HPlab introduction about Hadoop in cloudera
Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2
Wangda Tan and Wei-Chiu Chuang the current status of Apache Hadoop 3.x—how it’s used today in deployments large and small, and they dive into the exciting present and future of Hadoop 3.x—features ...
apache hadoop v2.7.0官方最新版
Pro Apache Hadoop, 2nd Edition是最新介绍Hadoop2.x的资料
Starting with the basics of Apache Hadoop and Solr, this book then dives into advanced topics of optimizing search with some interesting real-world use cases and sample Java code.
Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 完整版哦,绝对清晰,不是扫描的mobi格式电子书,请使用电子书库calibre (http://calibre-ebook.com/download) 打开。
Apache Hadoop (hadoop-3.3.3.tar.gz)项目为可靠、可扩展的分布式计算开发开源软件。 Apache Hadoop 软件库是一个框架,它允许使用简单的编程模型跨计算机集群分布式处理大型数据集。它旨在从单个服务器扩展到数千...
Apache Ambari是一种基于Web的工具,支持Apache Hadoop集群的供应、管理和监控。Ambari已支持大多数Hadoop组件,包括HDFS、MapReduce、Hive、Pig、 Hbase、Zookeeper、Sqoop和Hcatalog等。但是这里的 Hadoop 是广义...
Apache Hadoop十周岁:展望前方.pdf
Apache Hadoop (hadoop-3.2.3.tar.gz)项目为可靠、可扩展的分布式计算开发开源软件。 Apache Hadoop 软件库是一个框架,它允许使用简单的编程模型跨计算机集群分布式处理大型数据集。它旨在从单个服务器扩展到数千...
Together, Apache Hadoop and Apache Solr help organizations resolve the problem of information extraction from big data by providing excellent distributed faceted search capabilities. This book will ...