Jane W. curious where you landed on this. At google weād lean more on encoder style models, rather than generative decoders to get much more stable scores that we could tune with human ratings