I used the rvest package created by the great Hadley Wickham. If you're already good with the tidyverse, then web-scraping with rvest is a pretty easy learning curve. Just make sure you read the tutorial on using the SelectorGadget! And thank goodness WordPress is so easy to scrape: pages are individuated by the base URL (http:\\... etc.) and then a sequence of numbers.
The method here is pretty simple. I scraped LR for tags and dates. There are three questions I'm interested in answering:
1. what are blogging patterns like? i.e. how frequently does BL post on LR and how much each day?
2. what tags get used most frequently?
3. what's the frequency for co-occurring tags?
In this post and the next three, I'll answer these three questions. Then I'll attempt a wrap-up.
To me, this is probably the least interesting question, but it's one that can be answered easily.
UPDATE: This was not as easy as I suspected (but it was still a useful exercise). Scraping LR gave me the dates that BL posted but it doesn't fill in the dates he doesn't post. Here's a good set of instructions for how to fill those in.
Here's a line graph showing the volume of blogging from 1 January 2019 to 31 March 2019.
A quick methodological note: I often shortened the tag to something that captured the spirit of the tag but was much more readable on a plot. E.g. "The less they know, the less they know it" was shortened to "know-nothings." Also, all guest bloggers were collapsed into "guest" (but BL hasn't had many guest bloggers on in the last few years, so that's not a worry for right now). What tags got used the most during the 1st quarter? Here's the histogram.
However! The "Phil in the News" and "The Academy" tags are often paired with "Justin Weinberg" and "The New Infantilism." So at least some of the professional-issue tags are also about the subculture in our discipline that prefers Daily Nous to LR. What can help clear this up is a plot looking at tag co-occurrence.
3. Tag co-occurrence
This plot shows how often pairs of tags so up together. (This is a super helpful set of directions for computing the co-occurrence matrix.)
Take "Academic Freedom" and "Justin Weinberg" to start. The plot tells us the co-occurrence of these tags relative to all occurrences of each. To find their frequency relative to "Academic Freedom", find the intersection of both but with "Academic Freedom" appearing as the row value. Their co-occurrence relative to all instances of "Academic Freedom" is rather low (~.07). But the co-occurrence of "Academic Freedom" and "Justin Weinberg" relative to all instances of "Justin Weinberg" is rather large (it's about .67). So Justin is one concern about academic freedom on BL's blog, but he's far from being the only one. But whenever BL is talking about Justin, it's often about academic freedom.
Why do it this way? It doesn't make sense to relativize everything to the topic with the greatest number of tags. It just swamps everything. "Fascism Alerts" and "Cultural Interest" co-occurred 10 times, but that's a blip against the total number of times "Phil in the News" showed up (which is 454). I tried relativizing to whichever of the Topic 1 and Topic 2 was larger. This makes a symmetrical plot but it papers over important info. If A and B have a co-occurrence value of .5, it's not clear whether that's relative to all instances of A or B.
So what does the plot tell us? One thing that stands out is the rather light-colored column for "Phil in the News". This tells us that "Phil in the News" is a rather promiscuous tag, relative to how often other tags are used. This is confirmed by the relative dark shading of "Phil in the News" for Topic 1: no single tag stands out relative to the total number of tokens of "Phil in the News."
A few other bright-colored spots:
- notice that "PGR" (Philosophical Gourmet Report) & "Job Search Advice" relative to "PGR" is a bright yellow. So whenever BL is talking about the job search advice, it's in connection with PGR.
- But given all posts about the PGR, BL might be talking about different things: job search advice, but also philosophy in the news, issues in the profession, and the nature of philosophy. This last thing is kinda interesting: given the total number of PGR tags, you get "What is Philosophy?" roughly 20% of the time (PGR as Topic 1 and "What is Philosophy?" as Topic 2). Looking at the histogram above, the PGR didn't get tagged much in the 1st quarter of 2019, but it's an interesting insight that it's tied to tags about the nature of philosophy as much as it is.
- given all instances of "academic freedom" (which has the 4th greatest number of tag-tokens), you'll find "Fascism Alerts" with it roughly 1/4 of the time and "The New Infantilism" a little less than that.
- BL doesn't often use the "Justin Weinberg" tag (see histogram) but when he does, you better believe it's coming with "The New Infantilism".
- And "wankers" is rare, but it's almost always news.
Let's look at the co-occurrence plot.