Software Alternatives & Reviews

What tools are you using to clean and prep data for fine-tuning?

Kaggle Archive.org
  1. 1
    Kaggle offers innovative business results and solutions to companies.
    I use datasets from huggingface.com and kaggle.com. These datasets come in various formats, some of which you can train your bot on to generate a LoRA. You can also download any book in txt format, like what you might find at archive.org and clean it up a bit to make it a reasonable training dataset that you can use to make a LoRA. Clean by deleting weird characters, html code and blank lines to give your model a fighting chance at learning the data.

    #Data Collaboration #Data Dashboard #Databases 99 social mentions

  2. Internet Archive is a non-profit digital library offering free universal access to books, movies...
    I use datasets from huggingface.com and kaggle.com. These datasets come in various formats, some of which you can train your bot on to generate a LoRA. You can also download any book in txt format, like what you might find at archive.org and clean it up a bit to make it a reasonable training dataset that you can use to make a LoRA. Clean by deleting weird characters, html code and blank lines to give your model a fighting chance at learning the data.

    #Ebooks #Productivity #Bookmark Manager 8506 social mentions

Discuss: What tools are you using to clean and prep data for fine-tuning?

Log in or Post with