How I wrote a scraper to get an RTX 3080 (OSS)

As many are aware, Nvidia didn't at all manage to meet the high number of buying intents for the RTX 3080 generated by gamers, crypto-miners and others. It was the beginning of October when I realized that I am going to need a more powerful GPU for my soon to arrive HP Reverb G2 (VR headset with 4320x2160 resolution). Also, Cyberpunk.

As an experienced developer I thought "Well Bengin, can't be that hard to write a scraper that periodically calls a few sites and tells me what changed". I would only need to quickly write something that checks a list of sites every 2 minutes and email me if the result was different from the previously fetched version. Simple.

Boy was I wrong.

Of course, I wanted something more customizable, scalable (duh). I think it's just me, but I am not really satisfied with developing hacky solutions. I want at least some UI (even if it would be command line) and the possibility to let my friends use it too without them having to install 10 tools.

In the next sections, I walk you through the basic structure of the app and some challenges. Warning: It's going to be technical (but from a rather high level). You can also always check the source code on my GitHub.

First, I thought about a React frontend and a NodeJS paired with a DB like MongoDB to build a multi-tenant web app. Then I realized that I don't need a React frontend. In the frontend I only want to enter some data (links, scrap-interval, etc.) and I want the frontend to be able to notify me when something changed.

I picked up Telegram earlier that year and I decided to just use a Telegram bot as the frontend - the notification would work without problems and would work really fast (as opposed to web push or email). Bonus: I could learn to write Telegram bots.  I mean, which developer doesn't like to play with new, shiny stuff😍?

Frontend

As already mentioned, I used Telegram as the frontend for webwatcher. I decided to try out Telegraf, a NodeJS Framework/Wrapper because of its dialogue modelling capabilities. At the end, creating a new target for webwatcher is a dialogue:

Creating a webwatcher target to notify when a new blog posts on this site is available

As you can see, the whole usage and configuration of the bot is done via Telegram. This way I can send the bot link to friends, and they can use it instantly without installing anything.

I also added the bot into a group with some friends who also needed a new GPU. Everyone can chat with the bot privately for private notifications, and it's also possible to set up the bot in a group so many people get notified for the same things.

In the current state you can only delete targets but not modify them (which would be really useful, but yeah, I kind of don't need the webwatcher at the moment). When deleting a target, I used the inline buttons Telegram offers, to list the targets that the user can delete:

Deletion of a target

The inline buttons are removed upon selection and a confirmation message is sent.

I don't want to go into detail on how I modeled the dialogue (mostly because I didn't do a great job at making it readable or maintainable). But I want to show how one part of the dialogue looks because the documentation was really not that helpful:

As you can see there is a WizardScene which takes an id and then the different steps of the dialogue. Pretty neat, but took a lot of fumbling to get right.

Backend Infrastructure

In order to store the targets and their configuration persistently, I use a MongoDB - it is pretty much only used for that.

I had difficulties finding a way to properly set up a Telegram bot development environment. Even when developing on your local machine, you have to connect to Telegram's API servers, so you can test your changes.

I created two bots, one for development and one for the live version which would run on my server. The live one would always run the latest master version of the project's repository (Jenkins webhook which builds and deploys a Docker container (Yes yes, I know, Overkill, but I love CI/CD)).

This way I could try implementing a feature or fix a bug in my local environment and if that works, I can simply commit and push, and it would be live in a minute or so. DevOps magic is really cool.

Backend Logic

When the NodeJS project then starts, all targets are read from the database. Then each target that has been read is set up. The same set up routine is executed for a newly, dialogue-created target, and it works like this:

  1. Fetch Website and Execute Check
  2. SetInterval to check periodically
  3. Put return value of setInterval into a map for stopping this specific target

For the actual fetching, I use axios, my favorite http library. Depending on the target's http verb (post or get), a post or get request is made. The response is then interpreted differently, depending on the target mode. I made 4 modes, each mode does a different transformation with the response. After the transformation, the result is compared against the previous result of this target. The 4 modes are

  • RAW - simply returns the response
  • CONTAINS - takes a string argument and returns if the result contains the argument (for example a word) as a boolean
  • JSON - also takes a string argument and interprets the response as a JSON object and queries it with the argument (for example [0]["title"])
  • HTML - also takes a string argument and interprets the response as HTML, queries it with basic CSS selectors (I use node-html-parser for that) like .title or #news-wrapper and returns the content of the HTML element

This means that with the mode, the user can choose to get notified when anything changes (RAW), the occurrence of a word changes from yes to no or vice versa (CONTAINS), an API's JSON response changes at a specific location (JSON) or a specific HTML element's content on a site changes. After that I simply notify the user if necessary (via the saved telegram chat id in the target).

The hardest thing in the backend was actually the dialogue modelling. Not only because the documentation lacks some details but also because I created it with the intent of having a linear dialogue that just asks for some values. Due to the GET or POST question and the different modes, some questions texts depended on answers of previous questions. My implementation of that is a bit hacky, which I don't like, but it works 🙌. Telegraf surely has better ways to deal with this - if someone knows a good tutorial on that, please comment.

Conclusion

In the end, the webwatcher was a fun side-project for learning and hey, it allowed me to buy an RTX 3080 in covid-october. And now that the Reverb G2 arrived, I got to say that Half-Life: Alyx looks dauntingly realistic o.O

Anyways, thanks for reading! If you have any questions, please feel free to comment.