We do not report the dataset we used to train the models, since it has not been collected by us, and it is freely available on the project page (at the time we worked on twitpersonality).
However, we indicate the data format that the script are expecting to read. Two files are required, the first contains a list of status updates and the second a list of personality traits scores. All entries are separated by new line. Entries at the same index of both the files refer to the same user (Note that some user are repeated in the Gold Standard).