I'm trying to get a development single-node cluster setup done on my MAC OS X 10.9.2 with hadoop. I've tried various online tutorials, with the most recent being this one. To summarize what I did:
1) $ brew install hadoop
This installed hadoop 2.2.0 in /usr/local/Cellar/hadoop/2.2.0
2) Configured Environment Variables. Here's what the relevant part of my .bash_profile looks like:
### Java_HOME
export JAVA_HOME="$(/usr/libexec/java_home)"
### HADOOP Environment variables
export HADOOP_PREFIX="/usr/local/Cellar/hadoop/2.2.0"
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_COMMON_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_PREFIX/libexec/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_PREFIX
export HADOOP_MAPRED_HOME=$HADOOP_PREFIX
export HADOOP_YARN_HOME=$HADOOP_PREFIX
export CLASSPATH=$CLASSPATH:.
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/libexec/share/hadoop/common/hadoop-common-2.2.0.jar
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/libexec/share/hadoop/hdfs/hadoop-hdfs-2.2.0.jar
3) Configured HDFS
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/Cellar/hadoop/2.2.0/hdfs/datanode</value>
<description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/Cellar/hadoop/2.2.0/hdfs/namenode</value>
<description>Path on the local filesystem where the NameNode stores the namespace and transaction logs persistently.</description>
</property>
</configuration>
3) Configured core-site.xml
<!-- Let Hadoop modules know where the HDFS NameNode is at! -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost/</value>
<description>NameNode URI</description>
</property>
4) Configured yarn-site.xml
<configuration>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>128</value>
<description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
<description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
<description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value. </description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
<description>Physical memory, in MB, to be made available to running containers</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
<description>Number of CPU cores that can be allocated for containers.</description>
</property>
</configuration>
5) Then I tried to format the namenode using:
$HADOOP_PREFIX/bin/hdfs namenode -format
This gives me the error: Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode.
I looked at the hdfs code, and the line that runs it basically amounts to calling
$java org.apache.hadoop.hdfs.server.namenode.NameNode.
So thinking this was a classpath issue, I tried a few things
a) adding hadoop-common-2.2.0.jar and hadoop-hdfs-2.2.0.jar to the classpath as you can see above in my .bash_profile script
b) adding the line
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
to my .bash_profile on the recommendation of this tutorial.(I later removed it because it didn't seem to help anything)
c) I also considered writing a shell script that adds every jar in $HADOOP_HOME/libexec/share/hadoop to the $HADOOP_CLASSPATH, but this just seemed unnecessary and prone to future problems.
Any idea why I keep getting the Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode ? Thanks in advance.
See Question&Answers more detail:
os