Q-Learning in Python