2013|10|11|12|
2014|01|02|03|04|05|06|07|08|09|10|11|12|
2015|01|02|03|04|05|06|07|08|09|10|11|12|
2016|01|02|03|04|05|06|07|08|09|10|11|12|
2017|01|02|03|04|05|06|07|08|09|10|11|12|
2018|01|02|03|04|05|06|07|08|09|10|11|12|
2019|01|02|03|04|05|06|07|08|09|10|11|12|
2020|01|02|03|04|05|06|07|08|09|10|11|12|
2021|01|02|03|04|05|06|07|08|09|10|11|12|
2022|01|02|03|04|05|06|07|08|09|10|11|12|
2023|01|02|03|04|05|06|07|08|09|10|11|12|
2024|01|02|03|04|05|

2019-05-08 "Is this learning method so good?" [長年日記]

During the holidays, I was able to run a simple program of machine-learning, "Q-learning" (a type of reinforcement learning).

It is fun to see the learning progress, but

"Is this learning method so good?"

I can not answer the question.

-----

The characteristic of Q-learning is to connect the chain of "states" to an object whose rules are completely unknown. The merit of Q-learning is that it can automatically find the most efficient "chain of states".

This means "even if I leave it alone, the method can find the solution (x answer) by itself."

What we should not forget is that "it can make us easier".

However, in fact, reinforcement-learning can not be easily applied unless it is an object (such as a game) whose state is terribly simple.

If I express the state "as is" without any ingenuity, the solution space becomes too big, and no learning effect can be obtained at all. As a result of it, the state list gets too long and memory is smashed. In addition, it takes too much time for reward calculation, and the learning does not progress.

In other words, it goes without saying that the amount of information in the time continuum world is too large. However even just a state, that is cut as the time plane of the world, is not enough to handle on a computer.

For example, I am afraid that that even if we can use all super-computers in the world, it will be impossible to make them understand only ten seconds of my real world as a ten time planes

Therefore, we have to compress the time plane of the world, that is "modeling of the state".

In addition, no computer in the world has capabilities to realize "modeling of the state" automatically.

Above all, if we can realize "modeling of the state" by ourselves, we don't have to use the method of reinforcement-learning. We can use the model for the existing method(e.g. fuzzy reasoning).

If we can succeed at the modeling, we don't have to need reinforcement-learning.

In order to avoid this "modeling of the state", it seems that there is a way to compress the time plane, using "neural network".

However, in order to use reinforcement learning, if I have to build up a neural network, I will shout "What is what?"

-----

Well, whenever I wrote a lot of kanji sentences like above, I am usually in a passion.

I have been asking myself all the time while writing this Q-learning code

"The merit of reinforcement-learning is to make me easier, isn't it? Am I right?"