METHOD OF AUTOMATED CONSTRUCTION AND EXPANSION OF THE KNOWLEDGE BASE OF THE BUSINESS PROCESS MANAGEMENT SYSTEM

The problem of constructing and using the knowledge representation in the process control system is studied. It is shown that when implementing knowledge-intensive business process management, it is necessary to use automated construction and expansion knowledge base to support decision-making in accordance with the current state of the context for the implementation of business process actions. The state of the context is specified as a set of weighted logical facts, the arguments of which are the values of the attributes of the events of the business process log. The sequence of the process implementation at each moment of time is displayed in the form of a probabilistic distribution of the possible rules of executing the actions of the business process in this context. The method of automated construction and updating of the knowledge base of the information system of process control is proposed. The method includes the stages of forming knowledge representation templates, constructing context descriptions, logical facts, constructing rules, and calculating the probability distribution for rules. The method creates opportunities to support decision-making on the management of the business process in the event of a discrepancy between the current implementation of the business process and its model.


Introduction
Process control systems realize the "horizontal" management of the enterprise by developing business process (BP) models and further managing the BP using these models [1,2].
Process management of the enterprise provides for the construction, use, analysis and refinement of models of business processes [3,4].The effectiveness of process management largely depends on the adequacy of BP models.For traditional business processes with an a priori given structure, the problem of adequacy of the BP model is solved at the analysis stage, after completion of the process.Solving this problem for a class of knowledge-intensive business processes causes considerable difficulties due to the key feature of such processes: they can change the sequence of actions based on personal decisions of the knowledge workers [4,5].Knowledge workers make a decision to change the course of the process taking into account the current state of the subject area and use both publicly available explicit knowledge and implicit personal knowledge.The latter usually have the form of contextual causal dependences obtained experimentally [6,7].Such knowledge is not included in the business process model [8,9] and can't be obtained by traditional methods of knowledge engineering.
Thus, when managing knowledge-capacious business processes, the problem arises of constructing and using the knowledge base in the process control system.
The behavior of business processes is recorded by the process management information system in the form of event logs [10].The description of each event log contains information about the operation of the process and the context of the execution of this action, which allows to determine the cause-effect relationships between actions and conditions of their occurrence.

Literature review
Traditional approaches to the construction of knowledge bases (KB) require considerable time spent by qualified specialists in order to formalize the expert experience in the form of cause-effect dependencies.Therefore, such methods are unsuitable for construction knowledge bases for control systems that operate in real time.
In the last decade methods, approaches and technologies of automated knowledge base construction have been widely spread [11,12].Such methods are designed to identify knowledge in large databases available on the Internet [13,14].However, these approaches have a significant drawback, which narrows the scope of their use: they focus on the static description of dependencies in the domain, which requires the preservation of previous versions of the dependencies presented in the knowledge base [15].
At the same time, when using the knowledge base to support decision-making in the process management system, it is necessary to update the facts base on the subject area synchronously with the progress of relevant business processes.It is also necessary to represent the multivariance of such facts and connections between them.
To solve the latter problem, a probabilistic representation of knowledge based on Markov logical networks is used [16][17][18][19].The basic idea of such networks is use of patterns of logical dependencies to construct a set of weighted predicates reflecting knowledge of the subject domain [20].However, when constructing such a representation for process control systems, it is necessary to take into account the features of the event logs as input data for the identification of knowledge.Taking these features into account simplifies the construction and expansion of the knowledge base.
The aim of this article is development of a method for automated construction and updating of the knowledge base in the process management information system that would ensure continuous updating of knowledge based on log analysis.

Method of construction and expansion the knowledge base
The method uses the logical-probabilistic representation of knowledge based on Markov logical networks.This view takes into account the following characteristics of the event log: -the log reflects the properties of the artifacts of the business process, i.e. objects with which this process interacts; -with each event the log is associated with a timestamp, as well as a set of attribute values; -the attributes of the log events correspond to the attributes of the BP artifacts; -a set of attributes represented by attributes of artifacts describe the context for executing the actions of the business process.
The representation of knowledge is as follows: where Af -set of artifacts; F -set of logical facts f i ; w i -weight of the logical fact f i ; r j -a logical rule that operates with logical facts; w j -weight of the logical rule r j ; ф k α -the value of the artifact property at the time ф the event was recorded in the business process log; -the probability of the rules being executed at the currently known values of the properties of artifacts; C -a priori known causal dependencies acting as limitations of the subject domain.
This knowledge representation takes into account the sequence of business process deployment in time.The static aspect is given by facts and rules with arguments from the event log, and dynamic -in the form of the current probability distribution of rules execution in accordance with the logged events.

Computer Sciences
This allows to solve the problem of predicting the most likely behavior of a business process in the event that the current state is unpredictably changed due to the fact that the performers have changed the sequence of actions specified in the model.
The probability distribution of possible realizations of the business process takes into account the weighted sum of the rules and has the traditional form for Markov chains: where Z -the partition function used for normalization.
The method includes the following stages.
Stage 1. Building a data representation and knowledge template by defining classes of artifacts (properties of artifacts), typical facts f i (Af) and rules f i (Af) in accordance with the log structure.Logical facts templates reflect the attributes of events.The rule templates correspond to the sequence of events and reflect the relationship between the context and actions (by the state of the actions) of the business process.
The result of this stage is: (1) a subset of predicates that establish logical facts; (2) a subset of predicates that establish logical rules.The number of such predicates is limited due to the fact that all events of the log have the same structure.Let's suggest using a minimal set of three predicates: a predicate that sets the value of the artifact property; a predicate that specifies a set of properties for the event; predicate, which determines the rules of transition between events.
Stage 2. Building/supplement of the description of the domain context.At this stage, the attribute of the log events to the classes of artifacts and their properties is determined.Let's note that when building a knowledge base using relational DBMS, the list of properties of artifacts and unique values of these properties is formed from the process log in tabular form by means of SQL queries.
Stage 3. Building the logical facts by substituting the arguments into the predicates as the values of the event attributes.Such arguments characterize the current state of the business process.
The result of the stage is the knowledge base in the form of a set of logical facts that reflect the normal behavior of the business process.
Stage 4. Calculation of the weights of the logical facts in such way that their values correspond to the probability of the rules being executed on the set of events of the business process log.
As a result of this stage, the rules are adapted to take into account additional information about new log events.
Traditionally, such calculation in Markov logical networks is performed by methods based on gradient descent Stage 5.The construction of rules, the arguments of which are the logical facts obtained in stage 3. Rules are formed on the basis of a predicate-template by substitution of values of arguments.
Stage 6. Calculation of probabilities for rules based on logical facts, the arguments of which are attributes of the events of the business process log.
As a result of this stage, information on the most likely actions of a business process with context-bound information can be used by decision maker.Taking into account the obtained probability values, information on the abnormal behavior of the business process can be generated.
Stages 2-6 are repeated as the business process runs and new events appear in the log.

Experimental procedures
The aim of the experiment is testing the possibility of applying the proposed method for solving decision support tasks when detecting intrusions for processes in computer systems.
During the experiment, we used the logs of the computer system CIDDS-002 [21].CIDDS-002 contains logs with weekly sequences of events.Each event log is characterized by a set of attributes (Source IP Address, Source Port, Destination IP Address, Destination Port, Transport Protocol, Duration of the flow, Number of transmitted bytes, etc.).The format of the input data is shown in Fig. 1. ( Logical facts can be generalized.A generalization of logical facts consists in choosing such number of attributes that would allow to achieve a given generality of the fact without simplifying it.In other words, when constructing a pattern of logical facts, let's remove irrelevant attributes.In particular, the example does not use the timestamps. Logical rules correspond to the transition between events, for example for facts f 1 and f 2 from the given fragment of the log, the general rule for the transition has the form 1 1 2 r (f f ).∧ From the given log, sequences of events that correspond to processes of normal functioning and carrying out external intrusions are filtered.These sequences are represented as separate traces.The trace of the normal operation process contains 1011427 events, and the process of external intrusion is 37148 events.
Weights of logical facts are calculated separately for both routes.This makes it possible to compare the likelihoods of the rules for normal operation and for attacks.Initially, let's identify the abnormal behavior of the process.However, abnormal behavior does not always indicate errors or external intrusions.It is possible that this behavior of the process is simply not included in the knowledge base.Therefore, then a comparison is made with the template behavior of the process during the invasion.As a result, explicit invasion processes are cut off.The remaining dependencies require consideration of the decision maker.
We have simplified the definition of weights of logical facts in comparison with traditional methods due to the need for real-time work.Traditional methods are extremely resource intensive.
Therefore, the weights of facts and rules are determined based on the frequency of the appearance of attribute values.Table 1 contains a subset of weights for some attribute values for both traces.
From this table it is possible to see that the weights of the attributes are radically different for the same values under normal operation and during an intrusion.For example, for the attribute Proto="ICMP", the weights for normal and abnormal behavior are 1000 times different.Significant differences between attribute weights create opportunities for comparing states in normal operation and when an attack occurs.
Weights of the transitions between the event attributes for each pair of attributes are also calculated.These weights are calculated for normal operation and for intrusion.
The weights of attributes are summed when determining the weights of a logical fact and rule.A comparison of the values of the subsets of the transition weights is presented in Table 2.
From this table, it is possible to see a difference in the weights of rules that specify the transitions between attributes for normal operation and for intrusion.
When deciding whether to detect attacks, the probability calculation for the rules of transition between events is performed taking into account the probability of a logical fact describing the state.Then, the obtained probabilities are compared for the normal work process and the intrusion process.Based on the results of the comparison, it is decided to detect an attack on the computer system.

Results and Discussion
The result of the work is the method of automated construction and replenishment of the knowledge base of the process control system.
The method uses process logs as a source data.As a result of using the method, the following components of the knowledge base are created.
First, a description of the normal behavior of processes in the domain is formed in the form of a set of logical facts.These facts reflect the relationships between the attributes of the log events.This set of facts can be used to identify abnormal behavior of business processes and further adjust the management of such processes.
Secondly, the rules for executing the business process are formed, reflecting the links between the logical facts.This set of rules should be used to identify the reasons for deviations in the behavior of the business process.The identified set of reasons is represented in the form of a subset of attributes that specifies the conditions for the abnormal operation of the business process.
Advantages of the proposed method consist in the ability to combine strategies for identifying abnormal behavior of a business process based on knowledge of its normal behavior and identifying misused behavior based on relevant rules to support decision making in process management.
The disadvantage of the method lies in the fact that the completeness and consistency of the knowledge base largely depends on the quality of the log: the number of event attributes, as well as the correctness of the event timestamps.
The method can also be used to analyze the activities of arbitrary process-oriented systems that form logs.In particular, the proposed method can be used to detect intrusions in a computer system.At the same time, a knowledge base is generated about the behavior of processes (running programs), including dependencies related to known intrusions.

Conclusions
The problem of formalization and use of knowledge in the business process management system is considered.The expediency of the automated construction and adaptation of the knowledge base with the purpose of its further use for supporting decision making on business process management is grounded.
The proposed method has the following differences from traditional approaches.First, in constructing facts and rules, the log structure is taken into account: the state of the context for executing the actions of the business process is expressed through the attributes of the events, and the actions through the causal dependencies between the events, which allows to take into account a fixed number of event attributes and thereby simplify the calculation of the weights of the logical dependencies.Secondly, the method provides a cyclical refinement of the weights of facts and rules as the process progresses and new events appear in the log, which allows to continuously adapt the knowledge base to real-time management needs.
In practical terms, the method provides opportunities to improve the management of knowledge-intensive business processes and the ability to identify abnormal and misuse process behavior.Improving the effectiveness of process control is achieved by using causal dependencies between the context and the actions of the process to build new paths for its implementation.The detection of anomalous and misuse behavior of the process is carried out on the basis of a comparison of the probabilities of newly formed rules when new events occur.

Table 1
Weights of the Event Attributes

Table 2
Weights of Attribute Values for Transitions Between Events