The early training dynamics: effect of learning rate, depth and width