INFLUENCING FACTORS INFORMATION USAGE FOR SPLITTING DATA SAMPLES IN MACHINE LEARNING METHODS TO ASSESS IS STATE
I. S. Lebedev
St. Petersburg Federal Research Center of the Russian Academy of Sciences (SPC RAS)
M. E. Sukhoparov
Russian State Hydrometeorological University
Annotation: Improving the qualitative indicators of identifying the state of information security of individual cyber-physical systems segments is associated with the processing of large information arrays. A method of splitting data samples is proposed to improve the quality of algorithms for classifying information security states. Classification models are configured on training sets of examples in which outliers, noisy data, and an imbalance of observed objects may be present, which affects the qualitative indicators of the results. At certain points in time, under the influence of the external environment, the frequency of occurrence of observed events, the ranges of recorded values may change, which significantly affects the quality indicators. It is shown that a number of events in the samples occur as a result of the actions of internal and external factors.
Keywords: information security, machine learning, dataset, influencing factors, the formation of data samples