neville park is a user on social.coop. You can follow them or interact with them if you have an account anywhere in the fediverse.
neville park @nev

oh, maybe you all would know this.

any tips for scraping a -based site for the urls of all posts by a particular author? I tried a few combinations of lynx -dump, wget, & grep but don't know enough about any of them.

i.e. https://site.tld/author/authorsname, https://site.tld/author/authorsname/page/2, page/3, etc., where the posts are like https://site.tld/1970/01/01/title-of-post

(please boost for visibility, just unlisted so it doesn't crosspost)

@nev there's no standard. It's all customisable, unfortunately, and the default settings only link to a post by its internal ID number.

codex.wordpress.org/Using_Perm

Unless whomever's site it is intentionally are publishing permalinks by author, or using a plugin that does so, you're hosed.

@wohali that's what I suspected, but...hmm wait I got a brain wave. Will have to see if this idea works...

@nev wget them all and let God sort them out!

...looks like it takes a wordpress plugin to list posts by author, unfortunately.

@nev Does the site in question have the API enabled? You might be able to vacuum up the entire site without needing credentials.

@drwho hmm, I don't know. How would I find out/tap into it?

@drwho i'm not really a coder and don't know how to even REST or whatever but perhaps this is a good time to start learning!

@nev I have an article on my website about it that might be enough to get you started. Search for "rest API" and it should be in the top five hits.

@nev What you need is a spider. There was a tool that allowed you to download all the data from a site, or at least list all the links...

This list might help you.

en.wikipedia.org/wiki/Web_craw

@nev HTTRACK! That was it! I used that to back up old websites of mine.

@nev You have options to download external lnks, and whether to download or not pages outside a given path. It needs a bit of trial and error, but it's excellent for backups.

@nev Have you looked at the RSS feed for the author? Hopefully it will provide you the links on a more parsing-friendly format.

@noemi oh! i didn't think of that. thanks for the tip

@nev No problem! Let me know if you need any help. :)

@nev use a library like beautiful soup and pull back all pages, but only keep the one that has a match in the byline tag?