where COMMAND is one of:
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
datanode run a DFS datanode
dfsadmin run a DFS admin client
fsck run a DFS filesystem checking utility
fs run a generic filesystem user client
balancer run a cluster balancing utility
jobtracker run the MapReduce job Tracker node
pipes run a Pipes job
tasktracker run a MapReduce task Tracker node
job manipulate MapReduce jobs
version print the version
jar <jar> run a jar file
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME <src>* <dest> create a hadoop archive
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
-- Check the status of Kerberos tickets
klist
-- invoke kinit to get a ticket
kinit li@COM
-- sudo user login to get a ticket
sudo -s -h -u d_pbp
/usr/kerberos/bin/kinit -k -t /homes/dfsload/dfsload.prod.headless.keytab dfsload@COM
-- Drop a ticket
kdestroy
Hadoop HDFS Commands
--View the hadoop commands
hadoop
--View the version of hadoop
hadoop version
--View FS shell commands
hadoop fs
-- List directory
hadoop fs –ls /user/li
-- List content of HDFS files:
hadoop fs -cat wordcount/output/*
hadoop fs -cat file.bz2 | bunzip2
hadoop fs -cat dir/*.bz2 | bacat | cut -d ^A -f 125,126 | cat -v
hadoop fs -test -e dir/*.bz2 | tail -1
-- Move files or directory
hadoop fs –ls /user/li/target /user/li/dest
--Copy file or directory
hadoop fs –cp file1 file2 file3 /user/li/dest/
hadoop fs -cp /user/li/file1.txt .
hadoop fs -cp /HRBlock/hrblock.data.reduced /HRBlock/ part-r-00000-- Create HDFS directoyr
hadoop fs –mkdir /user/li/input
-- Upload from the gateway host to the HDFS home directory
hadoop fs -copyFromLocal test-data/ch1/file1.txt /user/li
hadoop fs -put test-data/ch1/file1.txt /user/li
-- Download HDFS Files
hadoop fs –get /user/li/myfile .
hadoop fs -copyToLocal hdfs://dilithiumred-nn1.red.ygrid.com:8020/projects/prod/user/20130327/SIDEBID/user/part-00998.bz2 .
-- Delete directory/file, and remove revursively(rm -rf)
hadoop fs -rm /user/li/temp
hadoop fs -rmr /user/li/temp
-- Change permissions by chown, chgrp, chmod
hadoop fs -chgrp -R users /user/li
hadoop fs -chmod 755 /user/li
-- Viewing Data from HDFS
hadoop fs -text /user/li/tmp5/000000_0.deflate | tr '\001' ',' | head -1
hadoop fs -cat path_to_data/*.bz2 | bzcat | cut -d ^A -f 125,126 | cat -v
This selects columns 125 and 126 from Ctrl-A separated data.
To create the ^A, type: CTRL-v CTRL-a
Inside screen, type: CTRL-v CTRL-a a
--Transfer data between clusters
hadoop distcp -Dmapred.job.queue.name=adhoc -Ddfs.umaskmode=002 -i -m 40 -update webhdfs://ygrid.yahoo.com/user/li/search_20131211 hdfs://ygrid.yahoo.com/user/li/search_20131211
--Kill a job
mapred job -list |tail -n+3 |awk {'print $1" "$4'} |grep 'li'
mapred job -kill job_1399615563645_524816
--View job logs
mapred job -logs job_1374774840603_3324664
mapred job -logs job_1387925060187_4840299
-- Show available queues
mapred queue -showacls
mapred queue -list
mapred queue -info apg_dailymedium_p5
The meanings are percentages or fractions:
Capacity = % of the grid’s total capacity used by this queue under normal usage. If you add up the capacities for all queues you will get 100%.
MaximumCapacity =Fraction of the grid’s total capacity this queue can use. In this example, the p5 queue normally uses 5%, but it’s allowed to go as high as 40% of the total grid capacity.
CurrentCapacity = Current usage relative to Capacity. In this example, p5 is using 144% of its Capacity, or 7.2% of the total grid capacity.
-- Check running job list/Show number of jobs per queue for each user
mapred job -list
mapred job -list |tail -n+3 |awk {'print $5" "$4'} |sort |uniq –c
-- Another way to list jobs, both running and completed:
mapred queue -info apg_d**_p3 -showJobs
-- Check Gateway Quota
quota -u apoqa
-- Check HDFS Quotas, aka get count of objests
hadoop fs -count -q /projects/DSP
hadoop fs -count -q /user/li
-- Displays aggregate length of files contained in the directory.
hadoop fs –dus /user/li
hadoop fs -du hdfs://.com:8020/projects
-- Check the running processes
jps
-- Check is file / is zero /is dir
hadoop fs -test -e /user/li/
hadoop fs -test -z /user/li/
hadoop fs -test -d /user/li/
-- Check group members
/gridtools/generic/bin/showmembers –n GROUPNAME
/gridtools/gneric/bin/showmembers -n awrgroup
showmembers --netgroup cp_pnp_c_sudoers --type user --format comma
-- Launch R
echo USER=***
export INSTALL_ROOT=/homes/$USER/custom_root
/homes/$USER/custom_root/bin/R
-- Launch Pig
pig -Dmapred.job.queue.name=*** \
-Dmapreduce.reduce.memory.mb=3072 \
-Dmapreduce.map.memory.mb=3072 \
-Dmapreduce.map.java.opts="-Xmx2048M" \
-Dmapreduce.map.speculative=true \
-Dmapreduce.job.acl-view-job=* \
-Dmapreduce.task.timeout=1800000 \
-Dmapreduce.reduce.speculative=true \
-Dmapreduce.output.fileoutputformat.compress=true \
-param PARALLEL_ORDER=512 \
***_MB3.pig
-- Hadoop Streaming
hadoop jar $HADOOP_PREFIX/share/hadoop/tools/lib/hadoop-streaming.jar \
-input /ngrams \
-output /output-streaming \
-mapper mapper.py \
-combiner reducer.py \
-reducer reducer.py \
-jobconf stream.num.map.output.key.fields=3 \
-jobconf stream.num.reduce.output.key.fields=3 \
-jobconf mapred.reduce.tasks=10 \
-file mapper.py \
-file reducer.py
No comments:
Post a Comment