I really wish I had a local proxy for my browser that cached all page hits (maybe have a file size limit to not cache big videos) that I could easily #search against eg:
"find the word 'lawn' in pages I read in the last month"
That way I could get that one article I'm thinking about right now. I have no clue where it was or what network shared it.
Does #searx do this?
The rest of the first page of results in https://duckduckgo.com/?q=search+personal+web+history&t=ffsb&ia=web are all about deleting your search history from Google et al :P
https://en.wikipedia.org/wiki/Web_browsing_history Suggests the #Google Toolbar does this but I don't want that :)
I just want a #firefox extension that saves to disk for use by a local #xapian (or whatever) search engine.
@Greg Hey, yeah, I want that too! It is part of that thing I've been poorly describing for years, the research, log and web history browser thing.
My reason is that I learn via deep dives, meaning 50+ tabs and lots of pacing, and then I have to go think away from browser. So I'd like to capture all that ambient metadata to go back and find any page I downloaded, because I downloaded because that is what web browsers do, and I should be able to keep it.
Is there such a plugin? ^_^
@Greg Also! I want to use a sensible filesystem-based backup system for my browser.
Companies are gonna do their offerings for sync-accounts and whatnot, but it is important to make configs something I can rsync between devices. Then we have less privacy and data-mining issues.
Sorry, amending this as #reference, I am kinda spec'ing this out, now. ^_^
@Greg hmmm ... the Zotero citation manager, stores a local "snapshot" of a web page when you grab a citation for it to the citation manager. The Zotero dataset is then searchable, but I haven't played with how fully it indexes the content of those snapshots. https://www.zotero.org/
@Greg I just looked and it does seem to index the content of HTML snapshots (and also of PDFs).
@Greg you'd have to do your searching in the Zotero client UI, then launch the snapshot (or the online original) into the browser.
@paregorios and actively "cite"/click some button for each page :( I want it automatically for all pages. But maybe that's a good start!
@Greg ah, yeah I see
This article mentions #infoaxe https://www.guidingtech.com/8434/search-personal-browsing-history-from-anywhere-infoaxe/
But based on the archive.org history they "pivoted" a few times to doing different things: https://web.archive.org/web/*/www.infoaxe.com