With the continuous increase in the number of devices and users using the Internet, the data in thenetwork is growing tremendously. It has become essential to manage the traffic efficiently by proper analysis of the network flow. The collected traffic flow records are provided to Hadoop Cloud which pre-processes the records and provides the refined and reduced dataset as output. This refined data is analyzed using R to determine the patterns in the network which can be used to plan out the traffic flow and also to know the user behavior.
This behavior will help us to determine any abnormal activity, or intrusion risks. The traffic records of different size are taken and analyzed in 2 ways. One type involves the analyzing done directly by R and the other involves both Hadoop and R. Finally, a comparison is carried out between the two types based on the execution time and are tabulated.