Well friends, Hadoop as we all know is a wonderful framework for handling bulk data. For those of you who haven't heard about Hadoop, here's a head's up about it:
Hadoop is a distributed system framework for handling huge amounts of data and is highly scalable. The scalability is the result of a Self-Healing High Bandwith Clustered Storage , known by the acronym of HDFS (Hadoop Distributed File System) and a specific fault-tolerant Distributed Processing, known as MapReduce.
Enough said, I had a use case to extract huge amounts of data in the form of IP Addresses. I had to crawl large number of sites and extract some data out of them. I thought why not use a distributed framework like that of Hadoop for accomplishing this task. Thank you Hadoop. You solved my problem quite easily without much of an effort. Friends, I tell you - Hadoop is a great way of leveraging the power of Distributed Systems.
PS: I love you HADOOP!
Hadoop is a distributed system framework for handling huge amounts of data and is highly scalable. The scalability is the result of a Self-Healing High Bandwith Clustered Storage , known by the acronym of HDFS (Hadoop Distributed File System) and a specific fault-tolerant Distributed Processing, known as MapReduce.
Enough said, I had a use case to extract huge amounts of data in the form of IP Addresses. I had to crawl large number of sites and extract some data out of them. I thought why not use a distributed framework like that of Hadoop for accomplishing this task. Thank you Hadoop. You solved my problem quite easily without much of an effort. Friends, I tell you - Hadoop is a great way of leveraging the power of Distributed Systems.
PS: I love you HADOOP!
Thanks for the informative post. Can you please share some resources for a beginner?
ReplyDelete