From training to inference: The new role of web data in LLMs