An Enhanced Mechanism for Profiling and Searching the Internet Endpoints by Clustering the Endpoints Using Fuzzy C-Means Algorithm
Understanding and using the internet in worldwide is a challenging problem that is typically addressed by analyzing network traces. However, obtaining such traces presents its own set of challenges owing to either privacy concerns or to other operational difficulties. The key hypothesis of the research here is that most of the information needed to profile, the internet endpoints is already available around us on the web. We implement and deploy a Google-based profiling tool which accurately characterizes endpoint behaviour by collecting and strategically combining information freely available on the web. Unconstrained endpoint profiling approach is used to profile and classify the endpoints. The websites are classified and clustered based on the search hits which contain the hit text and URL. On querying, it matches the domain name and URL if it does not match then it verifies the key words. The key words in the web cache are clustered using Fuzzy C-means algorithm which enhances the speed of the search engine.