Task 10

kundan singh
5 min readJul 19, 2021

--

K-mean Clustering

K-means algorithm explores for a preplanned number of clusters in an unlabelled multidimensional dataset, it concludes this via an easy interpretation of how an optimized cluster can be expressed.

Primarily the concept would be in two steps;

  • Firstly, the cluster centre is the arithmetic mean (AM) of all the data points associated with the cluster.
  • Secondly, each point is adjoint to its cluster centre in comparison to other cluster centres. These two interpretations are the foundation of the k-means clustering model.

You can take the centre as a data point that outlines the means of the cluster, also it might not possibly be a member of the dataset.

In simple terms, k-means clustering enables us to cluster the data into several groups by detecting the distinct categories of groups in the unlabelled datasets by itself, even without the necessity of training of data.

This is the centroid-based algorithm such that each cluster is connected to a centroid while following the objective to minimize the sum of distances between the data points and their corresponding clusters.

As an input, the algorithm consumes an unlabelled dataset, splits the complete dataset into k-number of clusters, and iterates the process to meet the right clusters, and the value of k should be predetermined.

Specifically performing two tasks, the k-means algorithm

  • Calculates the correct value of K-centre points or centroids by an iterative method
  • Assigns every data point to its nearest k-centre, and the data points, closer to a particular k-centre, make a cluster. Therefore, data points, in each cluster, have some similarities and far apart from other clusters.

How the K-means algorithm works

To process the learning data, the K-means algorithm in data mining starts with a first group of randomly selected centroids, which are used as the beginning points for every cluster, and then performs iterative (repetitive) calculations to optimize the positions of the centroids

It halts creating and optimizing clusters when either:

  • The centroids have stabilized — there is no change in their values because the clustering has been successful.
  • The defined number of iterations has been achieved.

Network Security Based on K-Means Clustering Algorithm in Data Mining

Abstract. Nowadays, the network has become the basis of everything. Meanwhile, network security has become one of today’s most urgent social problem. Intrusion detection systems are sold through real-time monitoring of network traffic, and take corresponding measures when the suspicious transfer of suspicious problems of a new network security device. Intrusion detection system compared to traditional network security measures, have great advantages. Can solve the shortcomings of the original passive inspired, can also process it before the damage occurred, appearance of the intrusion detection system, has become an important part of network security.

Preface In today’s society, computer network security has become the chief problem of information society. With the continuous development of technology, the network intrusion behavior has the hidden power, the means of destruction is complex, there is no time space to restrict the existence of network, there is a great harm to the network security . Therefore, network security is the most important component of today’s society. As for the detection and prevention of intrusion detection, it becomes the primary problem that we need to solve. The research on intrusion detection system also becomes extremely important. Based on the data mining of k-means clustering algorithm, this paper conducts research on network security and discusses how to create a network security and harmonious environment . Intrusion detection system is a system that can detect all software and hardware, and the application value is high. At present the system has already become the main network security management tool, can collect different set information in the system, and then combined with the function of the system of detection and automatic response. Intrusion detection system is a behavior classifier, which operates through the judgment of information intrusion and non-invasive behavior. Here is the concept associated with intrusion detection. In the early intrusion detection system, Denning successfully proposed the general intrusion detection system model , which laid a solid foundation for future research of intrusion detection system.

Data Mining Algorithm

Data mining algorithm consists of cluster analysis algorithm, correlation analysis and classification algorithm. Clustering algorithms can be the object of the data set is divided into a lot of similar classes, and classification algorithm is similar, are complete data grouping, and then reference algorithm definition, with the help of clustering algorithm can obtain high similarity of the same object. Cluster analysis is a common method in data mining analysis, which can be used to show unsupervised anomaly detection, and can solve problems existing in traditional data mining methods. This method can be used in a new database without having to rely on pre-determined data categories and data category samples in intrusion detection system. Cluster analysis creates a good environment for the establishment of intrusion detection system.

Establishment of Intrusion Detection Model

Four general intrusion detection model is set up, the first to use collection system, guarantee the connection records in the process of use, and can get clustering analysis of data sets, and then with the help of clustering algorithm distribution connection records, distinguish normal and abnormal connection records. In this study, k-means algorithm was used to complete cluster analysis. Clustering algorithm results in more clustering, so there are some connection records in each cluster. According to the properties of a given connection record, the properties can be used to determine the two kinds of abnormal clustering and normal clustering. The exception clustering represents the clustering of the abnormal connection records, and the normal clustering represents the clustering of the normal connection records. In system applications, if you can’t use tagged data, you can’t clearly determine the normal or abnormal condition of the connection record, and then make the clustering tag. Typically, a threshold is used to record the record of the connection above the threshold for the normal clustering, whereas the other is exception clustering. Using cluster analysis result intrusion methods that connection records, first carries on the standardization, and then from the cluster aggregation clustering, to find the right to his central value close to the distance, complete classification operation according to the tag.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response