Overfitting is where test error is high and training error is low. Mine is opposite. In addition, they are not far apart 0.00062 vs. 0.00040. It is possible because of the dropout layer. However, it is not overfitting.

Changing the world, one post at a time. Sr Data Scientist, Toronto Canada. https://www.linkedin.com/in/susanli/