Advances in bio-inspired data analytics

What?

A set of advanced methods and algorithms, based on bio-inspired approaches, aimed at existing challenges of data analytics.

Problem

The development of Information Technology has brought us to a period of increasingly rapid creation, sharing and exchange of information. Aware of the importance of knowledge hidden in the abundance of data, decision makers are, nowadays, faced with the challenging task of data analysis and extracting relevant information out of data. As the amount of data surpasses the ability of high-performance computing systems to process it adequately using traditional approaches, the solution to this problem lies in the creation of advanced data analytics algorithms, methods and tools.

Goals

To developed a set of advanced methods and algorithms aimed at existing challenges of data analytics, which are based on bio-inspired approaches, such as evolutionary computation, artificial neural networks and swarm intelligence:

Allocation method for wise partitioning of data, which improves classification accuracy, precision and recall of large datasets significantly.
A Multi-Population evolutionary algorithm, which provides significantly more balanced classification results.
A semi-supervised algorithm based on self-training for enlarging available training sets with the help of information from unlabeled and unstructured data.
A binary particle swarm optimization method for the selection of informative attributes in high-dimensional data.

Solution

The methods were developed with the user’s (data analyst’s) benefit in mind. They provide transparent and balanced knowledge models with low levels of complexity, which allows the validation of discovered knowledge. Additionally, the methods require almost no user interaction. To apply the developed methods, we have designed and developed an intelligent data analytics system in the form of a web application for our industrial partner. It collects data from a number of web sources, searches for relevant news articles and extracts the needed information from those articles. The system was tested on the problem of gathering the news about infrastructural business investments around the world. The system utilizes the power of both traditional machine learning techniques (such as Random Forest) and modern deep learning methods (such as recurrent and convolutional neural networks) and combines it with our developed data analytics methods. By applying neural language processing methods for finding specific entities from the text, the system is able to learn to collect and extract relevant information automatically. The system is supplemented with the web application for managing the system, evaluating the results and tuning the settings accordingly.

Authors

Prof. Dr. Vili Podgorelec, Sašo Karakatič, Črtomir Majer, Jernej Flisar, Miha Pavlinek, Lucija Brezočnik, Prof. Dr. Marjan Heričko

Figure 1: Predictive power (overall accuracy).

Figure 2: Balance (average class accuracy)

Figure 3: Complexity (model size)

Figure 4: Data analytics tool for automatic collecting and extracting of information about infrastructural business investments.