Snippets (cartoons)

From Gnomon Chronicles
Revision as of 05:33, 3 July 2019 by Admin (talk | contribs) (Created page with "Things to use or delete. See Snippets. https://en.wikipedia.org/wiki/Cartoon https://en.wikipedia.org/wiki/Comic_book == Krazy Kat == Boing Boing reports: <blockquote...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Things to use or delete. See Snippets.

https://en.wikipedia.org/wiki/Cartoon

https://en.wikipedia.org/wiki/Comic_book

Krazy Kat

Boing Boing reports:

Joël Franusic became obsessed with Krazy Kat, but was frustrated by the limited availability and high cost of the books anthologizing the strip (some of which were going for $600 or more on Amazon); so he wrote a scraper that would pull down thumbnails from massive archives of pre-1923 newspapers and then identified 100 pages containing Krazy Kat strips to use as training data for a machine-learning model.

After a couple of false starts, which Franusic documents, he was able to train a model by feding the 100 "krazy"-containing thumbnails and a set without Krazy Kat thumbs that he labeled as "negative" to a Microsoft Custom Vision algorithm. He shelled out $180 for Microsoft's "Advanced Training" to be applied to his data, then set the model it produced loose on the remaining thumbnails.

The model crunched through the remaining thumbnails, then Franusic automated the download of full-sized scans from pages identified as likely to contain a Krazy Kat comic. When the dust settled, he had hundreds of Krazy Kat comics in a folder, including one strip that does not appear in any published book that Franusic was able to find.

Franusic has done an excellent job of summarizing his process notes, including source code, and has offered to share a complete set of notes with anyone who wants to build on his work. He's also produced a set of recommendations for people trying this kind of work in future, as well as a wishlist for newspaper archivists who are hoping that projects like this will surface interesting things in their archives.

Using machine learning to pull Krazy Kat comics out of giant public domain newspaper archives @ Boing Boing