Software Alternatives & Reviews

All public GitHub code was used for Codex/Copilot, regardless of license

CommonCrawl
  1. Git and Mercurial hosting, mailing lists, bug tracking, continuous integration, and more

    #Code Collaboration #Git #VCS 55 social mentions

  2. Content creation using state-of-the-art artificial intelligence. Test it now, no registration required!
    That’s already fairly commonplace for new agencies to generate articles using ML solutions such as https://ai-writer.com/ So are you claiming ABC, CBS, Fox, and NBC have all been plagiarizing and violating copyright for doing so?

    #Writing Tools #AI Writing #Ai Article Generator 7 social mentions

  3. Common Crawl
    > Just like how people are allowed to read websites, but scraping is often disallowed. Hosting code on Github explicitly allows this type of usage (scraping) according to their TOS so I have to ask again - why the sudden complains? Are we still talking about a shortcoming of the ML model, which very occasionally spits out a few lines of copied code or should we include search engines into this, because they do the exact same thing by design? robots.txt, foe example, has a non-binding, purely advisory character as well and Common Crawl [0] (also used for training GPT-3) publishes a dataset that by definition contains GPL'ed code as well, no matter where it's hosted. So is that off-limits now, too? [0] http://commoncrawl.org.

    #Search Engine #Web Scraping #Data Extraction 90 social mentions

Discuss: All public GitHub code was used for Codex/Copilot, regardless of license

Log in or Post with