Client- Slave and Client- Master Relationship

Gaurav Tank
4 min readOct 10, 2020
HDFS

In this blog, we are going to prove various concepts in the Hadoop Cluster. Here I have divided each task into easy steps which I guarentee will prove the Client- Slave and Client- Master relationship.

Prerequisites

  1. Internet connectivity
  2. At least 5 Linux OS in running state
  3. Hadoop cluster

Task1.1: Whenever the client uploads the file ( for ex — f.txt) of size 50MB and the replication is 3. To prove this we have to create a Hadoop topology and transfer a file to the slave.

Step1: First we have to make an account in AWS. Then launch 5 instances and start the instances. In our case, we have chosen free tier OS’s, t2.micro. It has 1Gb RAM, 1 core CPU with internet connectivity. And configure the Master-Slave topology by configuring core-site.xml and hdfs-site.xml files resp.

core-site.xml and hdfs-site.xml

Step 2: Our Hadoop cluster is set up. Now we have to upload a file to in our cluster. We can upload a file using the following command:

hadoop fs -put kali-linux-2019.2-amd64.iso /

Now, our file is uploaded to our Hadoop cluster. We can verify this by entering the IP address of the master and the 50070 port no in your favorite web browser. The port no 50070 is a specialized port no created to facilitate the user to directly view the cluster online.

Step 3: Now we have to click on Browse the filesystem option in that we can see our file. If we the block previous to the name of our file we are able to see a number 3. This number 3 signifies the total number of the replications our file has created.

By this, our first part of Task 1 is done.

Task1.2: Who is the one uploading the file? Does the client first uploads the file to the master or it directly uploads to the slaves.

Step 1: First we have to enter the following cmd to know from which IP the slaves are getting the payload.

tcpdump -i eth0 tcp port 50010 -n

This cmd will print all the traffic coming to the slave from the port 50010. This is the special port through which the data is transferred to the slave.

Step 2: Upload a file of a certain size from the client account.

Now, if you see the terminal in the slave account you will observe certain data is flashed on the screen. If you look closely you will find that the Slave 1 is getting the traffic from a public IP. That public IP is of Client not of Master.

Packets captured from Slave1, Slave2, and Slave3

Hence, we can conclude that the Master does not send the data. It is the client who transfers the data to the slave.

Task 2:Does the client go to the master and then read the file on slave via Master or Does the Client go to the slave directly and read the data?

Step 1: To prove this we have to again capture the traffic from port 50010. But this time instead of uploading the file we will download the file from the cluster to the client.

hadoop fs -copyToLocal /kali-linux-2019.2-amd64.iso /

This command will copy the file to the client. Now again we are able to see the captured traffic in the slaves:

Packets captured from Slave1, Slave2, and Slave3

Here, we can see that the traffic is sent to the IP of the slave and the receiver is the client. Hence, we can conclude that the client directly reads the file stored in a slave.

Conclusion

  1. Whenever the client uploads the file total of 3 replications are made by default. But, the client can change the replication factor according to their needs.
  2. The client gets the IP from the Master and directly uploads the file to DataNode.
  3. Whenever the client wants to read their data, it directly goes to the slave and reads the data stored in it.

That's all for this blog Thank you for reading!

--

--