I have files in HDFS that need to be compared with other files in the Git repo. So, I want to copy HDFS files into the Git repo. Another tool will compare that can't talk to HDFS.
Is it doable or not?
If yes please advise if there is another way to get the files out?githadoopgithubhdfs
Some ideas that come to my mind:
You can copy the files from hdfs to local machine and then run the tool that compares the files.
a) You can do it manually, using command line tools:
hadoop fs -copyToLocal <hdfs file> <local file>
b) You can compose oozie workflow that will contain action with your 'comparer' and will fetch files from hdfs using
c) If you do not have command line tools available you can fetch the files using