Breaking the Speed Limit: Strategies for 17k Tokens/Sec Local Inference