Crawling Goodreads books data in Node.js

Brown wooden book shelves with books

Published on
/5 mins read/---

Since Goodreads no longer supports fetching user's books data via their API, I've decided to crawl / scrape user's book data. There are two primary ways to do this: using the RSS feed or by exporting your library as a CSV file.

Goodreads API

No matter which method you choose, we'll parse everything into a clean, consistent format. Here's what our final GoodreadsBook type looks like:

The first method is to use the rss-parser package to parse the RSS feed and extract the book data.

Then you can fetch the data from the RSS feed using the parser object, and process it as needed.

Now that you have the data you might need to prettify them before storing or using in your application since the data is stored in a raw format.

Using a CSV Export

The second method involves exporting your Goodreads library as a CSV file and parsing it. This method gives you more data fields than the RSS feed.

First, you need to export your data from the Goodreads import/export page.

Once you have the goodreads_library_export.csv file, you can use the csv-parser package to parse it.

The data from the CSV needs to be transformed to a consistent format, similar to the one used for the RSS feed data. Notice that some fields like bookImageUrl are not available in the CSV export.

  • Pros: Can be automated to fetch data periodically.
  • Cons: Data refresh is not instant (can take hours). Provides fewer data fields compared to the CSV export.

CSV Export

  • Pros: Contains more detailed information about the books (e.g., publisher, number of pages, binding). The data is available immediately after export.
  • Cons: No image fields. You’ll need to fetch book covers separately. Exporting CSV is manual.

Choose your preferred method and happy crawling!