hi soubhik, yeah, like 5-7 are just checking to make sure your data is in the right format, I'm just wondering if you've checked if the lists both contain the same elements but but might be reordered. One way to do this is to check a counter of each list
from collections import Counter
Counter(eval_data_df.question.to_list()) == Counter(spans_dataframe.input.to_list())
if this is True, try removing reversed, if the difference is more complex than that, please try and let us know how the datasets differ and I can investigate