Y. Shi, et al.
JournalofFoodEngineering263(2019)437–445
max
min ·N
3.2. CNN structure
max
Nmax
(13)
The matrix form of the beer olfactory information was 100*10,
where 100 was the number of sampling points for each sensor and 10
was the number of sensors. In this paper, the first 90 sampling points
were selected for each sensor, and the sample matrix became 90*10,
which was converted to 30*30 as the input of the CNN.
where
is the maximum inertia weight, which is 0.9,
is the
minimum inertia weight, which is 0.3, Nmax is the maximummiitneration
max
algebra and N is the current iteration algebra. Formula (13) shows that
the
ables the particles to search globally in a wide range. As the number of
iterations increases, the particle gradually approaches the global op-
Fig. 6 shows the structure schematic diagram of the CNN. The
structure of the CNN consisted of 4 convolution layers, 3 pooling layers
and 2 full connection layers. After the last pooling operation, all feature
matrices were connected into a vector as input to the first full con-
nection layer. Table 3 shows the network parameters of the CNN. In all
convolution operations, the convolution kernel size was 3*3, the stride
was 1 and ReLU was selected as activation function. Padding was
‘same’, which meant that 0 was added to the periphery of matrix data to
preserve and extract edge features. In contrast, ‘valid’ did not add the
padding. In all downsampling operations, the stride was 2 and the filter
was 2*2. In the first full connection operation, ReLU was selected as the
activation function, and the number of neurons was 32 according to the
number of the pooling3 feature metrics. In the second full connection
operation, Sigmoid was selected as activation function, and the number
of neurons was 5 according to the number of categories. The design
process of each layer was as follows:
timal solution. Meanwhile, the
value decreases, which enables the
particle to search locally in a small range and ultimately achieve the
global optimal solution. The fitness of inertia weight
varies with the
number of iterations, so it is called adaptive inertia weight.
c1 reflects the information exchange between individual particles,
and c2 reflects the information exchange between the particle popula-
tion and the historical optimal trajectory. This paper introduced the
asynchronous learning formula to dynamically adjust c1 and c2. The
adjusted formulae can be defined as follows:
c
c
1max
c1 = c1max
1min ·N
2max ·N
N
2min
max
c
c
N
max
(14)
where c1max is the maximum of c1 learning factor, and its value is 2,
c
1min is the minimum of c1 learning factor, and its value is 1, c2max is the
maximum of c2 learning factor, and its value is 2, and c2min is the
minimum of c2 learning factor, and its value is 1.
(1) The original E-nose data input matrix was 90*10, which was con-
verted into 30*30. In principle, more features can be acquired by
means of convolution kernels, but too many features can lead to
overfitting of the recognition model. Therefore, 4 convolution
kernels were set to convolve the original data after adding padding
items. Here, 4 feature matrices were obtained in the same form, and
the matrix size of each feature was still 30*30.
It can be seen from Formula (14), with the increase of iterations, the
learning factor c1 value is the largest at the beginning of the iteration
and then decreases, while the c2 value is the smallest at the beginning of
the iteration and then increases. In this way, using the asynchronous
learning characteristics can exchange information between particles
(2) Eight convolution kernels were set to convolute the input matrices.
Here, 8 feature matrices were obtained in the same form, and each
feature matrix size was changed to 28*28.
achieved an effective balance between global detection and local
mining. In this paper, the compression factor was introduced and the
formula can be defined as follow:
(3) The data were compressed by means of pooling operation. In this
paper, the global average pooling operation was applied. Here, the
number of feature matrices remained constant, and each feature
matrix size was changed to 14*14.
2
2
c
c2 4c
(15)
(4) According to the parameters in Table 3, the calculation process of
matrices were obtained in the same form, and each feature matrix
size was changed to 2*2.
where c = c1+c2.
Finally, the adaptive particle swarm optimization algorithm with
compression factor and asynchronous learning factor was proposed
(CAAPSO). The particle velocity position updated formula can be de-
fined as follows:
(5) Before the full connection operation, it converted 32 feature ma-
trices with sizes of 2*2 into a feature matrix as the input to the first
full connection layer.
v (t + 1) =
(
t + c1rand()(qbest (t) q(t)) + c2rand()(pbest (t) q(t)))
3.3. CNN performance evaluation
q(t +
1) =
q(t) +
v(t +
1)
(16)
The original 90 groups of beer data were divided into two groups
randomly: 2/3 were used to train the CNN as training set (containing
validation sets), and 1/3 were used as the testing set. Data were pro-
cessed based on section 3.2 CNN structure.
3. Results and discussion
The batch training mode was applied to train CNN. The initializa-
tion batch size was 20 based on the smaller beer samples. The BP al-
gorithm was used to train the CNN by means of the gradient descent
algorithm. In the iteration process of weights and biases, only the
learning rate needs to be set, which was set to 0.1. Xavier was applied to
make the information flow better in the network. The variance in the
the weight matrix of the convolution kernels were initialized according
to the following uniform distribution.
3.1. Data analysis
A radar plot was used to illustrate the relationships and trends of
sensors response data. To visualize the data, one sample was randomly
selected from the five different beer samples. Fig. 5 shows the radar plot
of sensors 90 s for five different beers. The radar response forms of five
beers were similar, which may mean that the distinction was difficult.
While the W5S, W1S, W1W, W2S, W2W, W3S responses were larger,
the W1C, W3C, W6S, W5C responses were smaller. However, for beer
identification, we are not sure whether a large response sensor is highly
important, or a small response sensor is less important (Men et al.,
portant features within the sensor data.
6
6
W~U
,
(17)
441