[2404.01204] The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis