دانلود کتاب Empirical Approach to Machine Learning
by Plamen P. Angelov, Xiaowei Gu
|
عنوان فارسی: رویکرد تجربی به یادگیری ماشین |
دانلود کتاب
جزییات کتاب
one place the fundamentals of a new methodological approach to machine learning
that is centered entirely at the actual data. We call this approach “empirical” to
distinguish it from the traditional approach that is heavily restricted and driven by
the prior assumptions about data distribution and data generation model. This new
approach is not only liberating (no need to make prior assumptions about the type
of data distribution, amount of the data and even their nature—random or deterministic),
but it is also fundamentally different—it places the mutual position of
each data sample in the data space at the center of the analysis. It is also closely
related to the concepts of data density and has close resemblance to centrality
(known from the network theory) and inverse square distance rule/law (known
form physics/astronomy).
Furthermore, this approach has anthropomorphic characteristics. For example,
unlike the vast majority of the existing machine learning methods which require a
large amount of training data the proposed approach allows to learn from a handful
or even a single example, that is to start “from scratch” and also to learn continuously
even after training/deployment. That is, the machine can learn lifelong or
continuously without or with very little human intervention or supervision.
Critically, the proposed approach is not “black box” unlike many of the existing
competitors, e.g., most of the neural networks (NN), the celebrated deep learning,
etc. On the contrary, it is fully interpretable, transparent, has a clear and logical
internal model structure, can carry semantic meaning and is, thus, much more
human-like.
Traditional machine learning is statistical and is based on the classical probability
theory which guarantees, due to its solid mathematical foundation, the
properties of these learning algorithms when the amount of the data tends to infinity
and all the data come from the same distribution. Nonetheless, the presumed random
nature and same distribution imposed on the data generation model are too
strong and impractical to be held true in real situations. In addition, the predefined
parameters of machine learning algorithms usually require a certain amount of prior
knowledge of the problem, which, in reality, is often unavailable. Thus, these
parameters are impossible to be correctly defined in real applications, and the
performance of the algorithms can be largely influenced by the improper choice.
Importantly, even though the newly proposed concept is centered at experimental
data, it leads to a theoretically sound closed-form model of the data distribution and
has theoretically proven convergence (in the mean), stability, and local optimality.
Despite the seeming similarity of the end result to the traditional approach, this
data distribution is extracted from the data and not assumed a priori. This new
quantity that represents the likelihood also integrates to 1 and is represented as a
continuous function; however, unlike the traditional pdf (probability density
function), it does not suffer from obvious paradoxes. We call this new quantity
“typicality”. We also introduce another new quantity, called “eccentricity” which is
inverse of the data density and is very convenient for the analysis of the
anomalies/outliers/faults simplifying the Chebyshev inequality expression and
analysis. Eccentricity is a new measure of the tail of the distribution, which is
introduced for the first time by the lead author in his recent previous works.
Based on the typicality (used instead of the probability density function) and
eccentricity as new measures derived directly and entirely from the experimental
data, we develop a new methodological basis for data analysis called empirical. We
also redefine and simplify the fuzzy sets and systems definition. Traditionally, fuzzy
sets are defined through their membership functions. This is often a problem
because to define a suitable membership function may not be easy or convenient
and is based on prior assumptions and approximations. Instead, we propose to only
select prototypes. These prototypes can be actual data samples selected autonomously
due to their high descriptive/representative power (having high local typicality
and density) or pointed by an expert (for the fuzzy sets the role of the human
expert and user is desirable and natural). Even when these prototypes are identified
by the human expert and not autonomously from the experimental data, the benefit
is significant because the cumbersome, possibly prohibitive and potentially controversial
problem of defining potentially a huge number of membership functions,
can be circumvented and tremendously simplified.
Based on the new, empirical (based on the actual/real data and not on the
assumptions made about the data generation model) methodology, we further
analyze and redefine the main elements of the machine learning/pattern recognition/
data mining, deep learning as well as anomaly detection, fault detection and
identification. We start with the data pre-processing and anomaly detection. This
problem is the basis of multiple and various applications to fault detection in
engineering systems, intruder and insider detection in cybersecurity systems, outlier
detection in data mining, etc. Eccentricity is a new, more convenient form for
analysis of such properties of the data. Data density and, especially, its recursive
form of update, which we call RDE (recursive density estimation), makes the
analysis of anomalies very convenient, as it will be illustrated in the book.
We further introduce a new method for fully autonomous (and, thus, not based
on handcrafting, selecting thresholds, parameters, and coefficients by the user or
“tailoring” these to the problem) data partitioning. In essence, this is a new method
for clustering, which is, however, fully data-driven. It combines rank-ordering
(in terms of data density) with the distance between any point and the point with the
maximum typicality. We also introduce a number of autonomous clustering
methods (online, evolving, taking into account local anomalies, etc.) and compare
these with the currently existing alternatives. In this sense, this book builds upon the
previous research monograph by the lead author entitled Autonomous Learning
Systems: From Data Streams to Knowledge in Real time, Willey, 2012, ISBN
978-1-119-95152-0.
We then move to supervised learning starting with the classifiers. We focus on
fuzzy rule-based (FRB) systems as classifiers, but it is important to stress that since
FRBs and artificial neural networks (ANN) were demonstrated to be dual (the term
neuro-fuzzy is widely used to indicate their close relation), everything presented in
this book can also be interpreted as NNs. Using FRB and, respectively, ANNs as
classifiers is not a new concept. In this book, we introduce the interpretable deep
rule-based (DRB) classifiers as a new powerful form of machine learning, specifically
effective for image classification, which has anthropomorphic characteristics
as described earlier. The importance of DRB is multifaceted. It concerns not only
the efficiency (very low training time, low computing resources required—no
graphic processing units (GPU), for example), high precision (classification rate)
competing, and surpassing the best published results and the human abilities, but
also high interpretability/transparency, repeatability, proven convergence, optimality,
non-parametric, non-iterative nature, and self-evolving capability. This new
method is compared thoroughly with the best existing alternatives. It can start
learning from the very first image presented (very much like humans can).
The DRB method can be considered as neuro-fuzzy. We pioneer the deep FRB
as highly parallel multi-layer classifiers which offer the high interpretability/
transparency typical for the FRB. Indeed, up until now the so-called deep learning
method proved its efficiency and high potential as a type of artificial/computational
NN, but it was not combined with fuzzy rules to benefit from their semantic clarity.
Another important supervised learning constructs are the predictive models,
which can be of regression or time series type. These are traditionally being
approached in the same way—starting with the prior assumptions about inputs/
features, cause-effect relations, data generation model, and density distributions and
the actual experimental data are only used to confirm or correct these assumptions
made a priori. The proposed empirical approach, on the contrary, starts with the
data and their mutual position in the data space and extracts all internal dependencies
in a convenient form from these. It self-evolves from data complex
non-linear predictive models. These can be interpreted as IF…THEN FRB of a
particular type, called AnYa or, equally, as self-evolving computational ANN. In
this sense, this book builds upon the first research monograph by the lead author
entitled Evolving Rule-based Models: A Tool for Design of Flexible Adaptive
Systems, Springer, 2002, ISBN 978-3-7908-1794-2.
In this book, we use the fully autonomous data partitioning (ADP) method
introduced in earlier chapters to form the model structure (the premise/IF part).
These are the local peaks (modes) of the multi-modal (mountain-like) typicality
distribution, which is automatically extracted from the actual/observable data. In this
book, we offer locally optimal method for ADP (satisfying Karush-Kuhn-Tucker
conditions). The consequent/THEN part of the self-evolving FRB based predictive
models is linear and fuzzily weighted. In this book, we provide theoretical proof
of the convergence of the error (in the mean) using Lyapunov functions. In this way,
this presents the first FRB with self-evolving nature with theoretically proven
convergence (in the mean) in training (including online, during the use), stability as
well as local optimality of the premise (IF) part of the model structure. The properties
of local optimality, convergence, and stability are illustrated on a set of benchmark
experimental data sets and streams.
Last, but not least, the authors would like to express their gratitude for the close
collaboration on some aspects of this new concept with Prof. Jose Principe
(University of Florida, USA, in the framework of The Royal society grant “Novel
Machine Learning Methods for Big Data Streams”), Dr. Dmitry Kangin (former
Ph.D. student at Lancaster University with the lead author, currently Postdoc
Researcher at Exeter University, UK), Dr. Bruno Sielly Jales Costa (visiting Ph.D.
student at Lancaster University with the lead author, now with Ford R&D, Palo
Alto, USA), Dr. Dimitar Filev (former PhD advisor of the lead author in early
1990s, now Henry Ford Technical Fellow at Ford R&D, Dearborn, MI, USA), and
Prof. Ronald Yager (Iona College, NY, USA), Dr. Hai-Jun Rong (visiting scholar at
Lancaster University with the lead author, Associate Professor at Xi’an Jiaotong
University, Xi’an, China).