Characteristics and segmentation of social problems with kohonen self-organizing maps

Article history Received November 3, 2016 Revised December 22, 2016 Accepted January 10, 2017 Indonesia is a country with a low Human Development Index, it shows the number of quality and healthy standard in Indonesia is still poor. Indonesia also have various social problems such as overcrowding, poverty, unemployment, bad education level .This problem can bring negative impact for our society like increasing of crime rate. For identification phase of social problems and crime, Indonesian government does not integrate social problems which is identified can affect the crime and use descriptive statistics only. Further diagnosis required for cases of social issues. The purpose and benefits of this research is to determine the characteristics of the social problems in Indonesia, introduce and make segmentation using Kohonen Self Organizing Map’s algorithm. Hopefully the results of this analysis can helps government for make public policy in general, specifically future policy about social problems in Indonesia. Using Kohonen algorithm effective for visualizing of high-dimensional data by reducing the dimensions of ann-dimensional input into lower dimension while maintaining its original topological relations. Based of clustering result of provinces in Indonesia, it divided into 5 group and each group has similar characteristics. This is an open access article under the CC–BY-SA license.


Introduction
Social problems are phenomenon and symptom of reality in social life.Identification of social problems in society is different between one figures with others.Social problems are mismatch between element of culture or society which endanger social life [1].Social problems is contradict situation of society values, where people agreed to take some action for change those situation [2].
Certainly there are social problem in our community or society like juvenile delinquency, population problems, environmental pollution, as well as other social problems such as poverty, social inequality, etc.The existence of those social problems in community have positive or negative impact for the community itself.One of various impact because social problems is increasing of crime rate [3].Specially discuss about one of an impact of social problems is the increasing of crime rate.Although it was not main factor, but it affect the psychology of perpetrators.There are various theoretical reviews regarding the cause of crime related to economic issues change crime behavior directly.This theory includes decreasing of economy, comparative deterioration in socio-economic, reduced of opportunities in formal sectors, urbanization and economic growth that potentially lead the integration with poorer.Poverty have a correlation with crime [4], result of psychologists and sociologist's research prove that someone which is life under poverty, vulnerable commit a crime.In poor condition, people will experience anxiety, become very emotional, easily frustrated, even doing suicidal.Poverty also become the causes of crime as alternative in order of living his/her life [4].
Indonesia is a country with a low Human Development Index, it shows the number of quality and healthy standard in Indonesia is still poor.Indonesia also have various social problems such as overcrowding, poverty, unemployment, bad education level .This problem can bring negative impact for our society like increasing of crime rate.For identification phase of social problems and crime, Indonesian government does not integrate social problems which is identified can affect the crime and use descriptive statistics only.Further diagnosis required for cases of social issues.The purpose and benefits of this research is to determine the characteristics of the social problems in Indonesia, introduce and make segmentation using Kohonen Self Organizing Map's algorithm.Hopefully the results of this analysis can helps government for make public policy in general, specifically future policy about social problems in Indonesia.Using Kohonen algorithm effective for visualizing of high-dimensional data by reducing the dimensions of ann-dimensional input into lower dimension while maintaining its original topological relations.Based of clustering result of provinces in Indonesia, it divided into 5 group and each group has similar characteristics.
Hundred million of the world's population today, and one of the reason incidence of insecurity of their life are not surely terrorism only, but extreme poverty.Therefore, factor of density will make crime, it become worse if socio economic factors does not support community needs.
Population density will become worse if it was not accompanied by new job chance, it causing unemployment.It mean more unemployment will affect more crime [4], [5].It is evident that crime rate is higher in urban areas than in rural areas.Then, effective method except keeping public security and safety, government should open more jobs for their people, if unemployment decrease so do the crime rate will decrease.Except density, poverty and unemployment factor, there is another social problem that is educational inequality.Low level of education have significant influence with crime rate.Nowadays, the increasing of human resources quality can be seen from the average of education level in several region in Indonesia.This increase is a result of the growing demand for education to get a job with a better income, due to gain employment in the modern sector is dependent on their education [6].There is a negative correlation between level of education and the level of crime.First, higher education can bring or obtain legal employment.Secondly, highly educated person would tend to think of a criminal act, because the benefits are too small.So education will indirectly affect crime through increased wages.Someone who is graduates from lower level of education may have a poorer skill than someone which is graduated from high school or university.The analysis conducted by Ehrlich mention that education is important for population, because education helps to determine the benefits from legal or illegal activities [7].
Human Development Index (HDI) generally can show the society quality, it able to measure the achievement of human development life, HDI is built through basic three dimensional approach.The dimensions include long and healthy life; knowledge, and a decent life.The third dimension has a wide sense as related to many factors.To measure the dimension of health, use life expectancy.Furthermore, about the knowledge indicator using literacy rate and average length of school.As for measuring the dimension of decent life used indicator of the ability of purchase number of basic needs as seen from the average amount of spending per capita as the income approach which represents the achievement of development for a decent life.
In 2013 HDI of Indonesia those year do not change from 108th from 187 [8].Except Singapura (9), Brunei (30), Malaysia (62) and Thailand (89), other ASEAN country such as Myanmar (150), Laos (139), Cambodia (136), Vietnam (121) and Filipina (117) placed in low rank.Indonesia's HDI is 0,684 in 2013, there are upward movement which is not so far from previous year, ie 0,681.This report also highlight the lack of decent jobs especially for young people.That is major challenge in Asia and Pacific.22% of the unemployment rate in Indonesia is relatively high.Education must be accessible in eastern Indonesia like Papua.These areas often do not experiencing the improvement of living standard due to limited access to basic social services.
In general, crime itself is a behavior that violates the law and social norms, so people will against it.In many cases the crime occurred due to several factors.Factors that causing crime like biological factors, sociologically composed of economic factors (economic system, population, changes of market prices, financial crisis, lack of employment and unemployment), mental factors (religion, literature, newspapers, movies), physical factors such as climatic conditions and others, and personal factors (age, race and nationality, alcohol, war) that whole thing is part of a social problem and it needs to be fix by analyzing the social problems.
In identification stage, the government of Indonesia in Indonesian Crime Statistics 2013 through BPS is focused on the use of the Susenas Model and Statistik Potensi Desa, not integrating with social problems which is identified has an effect on crime.BPS only use descriptive statistics to determine the incidence and prevalence crime.
Developing of computer, software, and research measurement instrument, made research in various scientific fields capable to collect and analyze data set which is size and its dimensions continues to increase.
Author considers that need some method that can solve that problem, especially about social problem in Indonesia.Using Kohonen because it is an effective algorithm for visualizing of highdimensional data by reducing the dimensions of ann-dimensional input into lower dimension while maintaining its original topological relations.In addition, Kohonen SOMs is a nonparametric approach requires no assumptions about the distribution of the population.Therefore, researchers will identify and perform clustering for social problems which is particularly affect the rate of criminality.
Based on those background, the problem is how the characteristics and segmentation of social problems in Indonesia using Kohonen algorithm to clustering provinces in Indonesia.The purpose of this paper is to know the characteristics of the social problems in Indonesia, introduce and make segmentation algorithm using Kohonen's Self Organizing Map for provinces based on similar characteristics in terms of the social issues that affect crime rate.
Boundary of this paper, author using data obtained from BPS Indonesia for most recent year in 2013.Author using variables that have a significant effect on crime as the negative impact of social problems in Indonesia that based source of news and previous research.These variables such as population density, the school enrollment rate of children aged more than 15 years, unemployment, poverty, and Human Development Index in 2013.The method of analysis is using descriptive and analysis of algorithms cluster with Kohonen's Self Organizing Map.

Literature Review
Variable which is used by author is Human Development Index (HDI), HDI intends to shift the focus of development towards three factors.These three factors are difficult to measure precisely and so 'proxies' deemed to be the best indicators of the level of these targets are chosen to form the indices instead.Furthermore, the proxies chosen are relevant and available indicators in very potential region of study, enabling international comparison.Human Development Report the HDI combines three dimensions, there are:  A long and healthy life: Life expectancy at birth  Educational Index Mean years of schooling and Expected years of schooling  A decent standard of living: GNI per capita (PPP US$) The formula to count HDI is shown at equation 1.The indexX(I,J) is HDI component index i for region to j j.The variable i is 1, 2, 3 and the variable J is 1, 2, …. k region.Poverty Rate is a percentage of the population under the poverty line.Poverty line is the sum of the food poverty line (FPL) and Non-Food Poverty Line (NFPL) [9].Residents who have an average monthly per capita expenditure below the poverty line are categorized as poor.Food Poverty Line (FPL) is the minimum food expenditure, which is equivalent to 2100 kilocalories per capita per day.Non-Food Poverty Line (NFPL) is the minimum requirement for housing, clothing, education and health.Package of non-food commodities basic needs represented by 51 types of commodities in urban and 47 rural commodity.

𝑃𝑜𝑣𝑒𝑟𝑡𝑦 𝐿𝑖𝑛𝑒 = 𝐹𝑃𝐿 + 𝑁𝐹𝑃𝐿
  FPL = Food Poverty Line NFPL = Non Food Poverty Line Based on the equation 3, the variable α is 0. The variable z is the poverty line.The variable y1 is the average expenditure per capita of population in a month under poverty line (i=1,2,3,..,q), y1 < z. the variable q is count of population that live under poverty line and variable n is the population.
Population Density is the number of inhabitants per unit area [9].Ceude population density shows number of people for every kilometer square area.The total area is the total land area in an administration.School Enrollment is the proportion of school children at age level of education in the age group in accordance to the education level.
Unemployment figures are the percentage of the population who is unemployed compared with the productive population of an area.Unemployment is a person belonging to productive and for a period does not work, and willing to accept the job, and looking for work.
Segmentation is an attempt to classify objects of observation into several groups.Objects of observation that are in one group are generally more homogeneous than objects of observation in other group.
Descriptive statistics are methods relating to the collection and presentation data to become useful information.Descriptive statistics describe an overview of the object using sample or population it can also be presented as Pareto diagrams and tables [10].
Clustering used to analyze different group of data, similar like classification, but the grouping has not been defined before executable with data mining tool.Usually using a neural network or statistical methods.Clustering divide the items into several groups based on the data mining tool.The principle of clustering is to maximize the similarity between members of the class and minimizing inter-cluster similarity.Clustering can be performed on the data that has some attributes.Many clustering algorithms require the function of distance to measure the similarity between data.Also required a method for normalizing various attributes of the data.Clustering algorithms used to determine segment of data sets into sub groups which is equal [11].
Self-Organizing Map (SOM) is a form of unsupervised neural network that produces a low (typically two) dimensional representation of input space of one set of training samples.Self-Organizing Map (SOM) network is a neural network based method for dimension reduction.SOM can learn from complex, multi-dimensional data and transform them into a map of fewer dimensions, such as a two-dimensional plot.The two-dimensional plot provides an easy-to-use graphic aluser interface to help decision-maker visualize the similarities between consumer preference patterns.For example in AT & T data set, there are 68 variables.It would be difficult to visually classify consumers based on all these attributes because grouping must be done in a 68-dimensional space.By mapping the information contained in 68-variable set into a two-dimensional plot, one can visually group customers with similar preferences into clusters.These relationships can be translated into an appropriate type of structure that genuinely represents the underlying relationships between market segments.Hence, SOM networks can be used to build a decision support system for marketing management.It is designed to capture topologies and hierarchical structures of higher dimensional input spaces.Unlike most neural network applications, SOM performs unsupervised training, i.e., during the learning (training) stage, SOM processing input units in network and adjusts their weights primarily based on the lateral feedback connections.The nodes in the network converge to form clusters to represent groups of nodes with similar properties.A two-dimensional map of the input data is created in such a way that the orders of the interrelationships among objects are preserved.The number and composition of clusters can be visually determined based on the output distribution generated by the training process [12].This is several Relative research that associated with this paper, previous research becomes very important in order to know relationship between previous and this research.There are research that related to social issues nor analysis methods that used previously.One research result is clustering based on crime location as impact of social problems and notice socio-economic factors [13].Another research show count of population and unemployment has an effect with crime, while the number of industry and poverty indirectly effect on crime [4], [5].
Pinheiro separate company into several segmentation and take a policy for monitoring certain consumer groups using Kohonen's Self Organizing Maps [14].Kiang give conclusion that show segmentation of SOM algorithm and result demographic profile is better than K Means clustering [15].

Method
This research used secondary data obtained from website of Badan Pusat Statistika that accessible through http://bps.go.id.The data that used are the factors that affect the crime rate in Indonesia like population density, school enrolment rate of children aged more than 15 year, unemployment rate, poverty and HDI [16].This research conducted on January 2015.Latest data show 2013 data only and covering a whole Indonesia territory.This research using descriptive statistics and Kohonen Self Organizing Map's.

Result and Discussion
This research using descriptive analysis that able to describe characteristics of provinces in Indonesia based on several variables that affect crime rates based on previous studies and using of Kohonen algorithm make segmentation.Based on the Fig. 2, Jakarta is in first place with15.015inhabitants/km 2 and then followed by West Java, Banten, DIY, and Central Java.All provinces in Java have high population density but lowest one is West Papua with the density 9 inhabitants/ km 2 only.There is inequality of population density between Java and another islands.It is able to make another social, economic, or cultural problems if government made no effort for population distribution.The lowest school enrollment rates (APS) is Papua, followed by Bangka Belitung, West Kalimantan, Central Kalimantan and West Sulawesi.Fig. 3 shows a big proportion of children who is more than 15 year cannot feel the education facility.Special Region of Yogyakarta has a numeric percentage of children aged above 15 years of schooling highest followed Aceh, West Sumatra, Bali and East Kalimantan.Java Island, Jakarta occupies the first position with APS is 66.09 only.Variables of this research related with social, education, population and economy and this is the result of clustering using Kohonen algorithm.

Fig. 1
Fig. 1 illustrates environment concept.Left one figure show environment with R=1 it surrounding point 13.Right one show environment from R=2. Environment topology generally using 3 type of grid there is grid, hexagonal and random topology.Distance function with Euclidean.

Fig. 2 .
Fig. 2. Characteristics of Population Density in Indonesia

Fig. 3 .
Fig. 3. School enrolment rate of children aged more than 15 year (APS)

Fig. 4
Fig.4illustrates that lowest unemployment rate is Bali, while highest unemployment rate is Aceh.Big five which is have a high unemployment rate include 3 province in Java Island like Banten, West Java and Jakarta when those province have a high population density too.Government should preparing a new jobs to decrease unemployment rate.If decrease unemployment program succeed, it can increase the economy in society.

Fig. 5 .
Fig. 5. Poverty in Indonesia Based on Fig. 5, highest poverty rate, and top five province with highest poverty rates are mostly in East Indonesia such as Papua, West Papua, Maluku and Gorontalo.NTT locates in central region and capital city of Indonesia has lowest poverty rate followed by Bali, South Kalimantan, Bangka Belitung and Banten.

Fig. 6 .
Fig. 6.Human Development Index Papua, West Nusa Tenggara, East Nusa Tenggara, West Papua and North Maluku province with a low HDI, while Jakarta, Yogyakarta, North Sulawesi, East Kalimantan and Riau is a province with a high HDI.Papua province occupies the lowest position compared to other provinces.