Multilingual Dialogue Dataset Project for XDailyDialog

Reverse engineered prompt

Build me a Python research project for a multilingual dialogue dataset called XDailyDialog. It should include the dialogue data for English, Chinese, German, and Italian, with matching topic, emotion, and action labels, and keep the format close to DailyDialog so researchers can use it easily.

I want scripts that can turn the raw text files into monolingual, multilingual, and cross lingual versions, split them into train, dev, and test sets, and preprocess them into a format that works for model training. Add a simple way to run baseline experiments for dialogue response generation, including training, evaluation, prediction, and saving results.

Also include the kNN Chat workflow from preprocessing through fine tuning, datastore creation, index building, and final training. Please make the commands easy to run and document the steps clearly in the README, including setup requirements and license notes, especially that the dataset is not for commercial use.

Want more depth? Deep Reverse

liuzeming01/XDailyDialog — reverse-engineered prompt

Reverse engineered prompt