Finally wrote a post that's been stewing for a while: What You Miss By Only Checking GitHub
Many researchers, entrepreneurs, open source sustainability commentators, et al. assume that GitHub activity is a reasonable proxy for FLOSS as a whole. It's not.
Goes over some examples, a marketing graphic that made my eyebrows go up, research about how unrepresentative GitHub can be, and some tools to try.
Discoverability is a major issue in FLOSS.
Those of us who remember "Freshmeat" are now old.
But I do think we need something akin to it, or better.
@emacsen I think discoverability, in general, got easier when the web became more searchable... but there are different kinds of discoverability, e.g.,
1. finding something when you go actively seeking things like that
2. having a feed you can constantly visit so you become ambiently aware of potentially useful tools
3. getting notified/suggested when a friend, writer, speaker, or marketer wants you to know of a particular tool
I love when smart people disagree with me, it's such an opportunity to learn :)
I was thinking of #1, #2, and another one, which is "Where do I go for support on this?", which could be community support channels (mailing lists, IRC, Matrix, Discord, Git* issues), or paid commercial support.
@emacsen "Where do I go for support on this" is interesting partly because the answer often crosses "open source project"-specific boundaries. The example that comes to mind most easily for me is Python packaging, but any toolchain with multiple permutations of interoperable parts will also run into this. Considering....
The amount of friction required to get commercial support for FLOSS software is astronomical.
After decades, I finally have resources to pay people for work I want done, and I see tons of people looking for work online, and yet finding people to support tools that I want to support is challenging.
It's not even primarily a money issue.
If someone fixed this well (not just Fiverr or Upwork), I think they'd do the community a huge service and strike it rich.
(they'd also be hated)
Maybe this is amazing software (I don't know), but it certainly makes for a challenging first impression.
The project description just says "This is the source code for [this website]." The website doesn't exist AFAICT and the installation file mentioned doesn't exist either.
> We have not been able to prioritize this work, but we have developed enough of it that we can work with potential contributors. If you’re interested in helping us bring this service to life, familiarize yourself with the existing code and see what you can do.
@brainwane I'm happy to see that softwareheratige is now accepting small cgit instances at https://archive.softwareheritage.org/add-forge/request/list/
Much easier than submitting and resubmitting individual repos.
I think there is still a missing discoverability piece here, I'll bet 90% of self-hosted repos have not been archived by them. But also I think you're right to highlight them as a good example.
@brainwane I have to say, I'm highly skeptical the supposed value of "searchability" GitHub brings, I find DuckDuckGo or Google to be just as good... I prefer package repos like Debian, or depending on how well managed they are (i.e. not NPM!) language-specific ones like Hackage. There I know some quality-control is present!
Btw I specifically choose to self-host my hobby projects so as not to feed into the silo-mindset which is actively hurting them.
@alcinnz I presume that, when you search, you're looking for projects to use?
I think most of the people and institutions I am speaking to/about in my post are using GitHub search, and data from the GitHub API, for other reasons.
What tools do you use to self-host and do you like them?
@brainwane I'm using CGit with a rudimentary issue tracker. Currently all issues are forwarded to my feedreader for me to copy anything non-spammy to .ISSUES directories in my repos. I found this very easy to setup once I had a homeserver!
@brainwane Okay, it looks like the existing search on their site is limited to just urls and some limited metadata. Someone could conceivably download their data here: https://docs.softwareheritage.org/devel/swh-dataset/graph/index.html#swh-graph-dataset and use it to create a more robust search. But as the archive is 11 TB, this would be non-trivial.
Full-text search is part of our technical roadmap, but the resources to deploy that at our scale are very significant and we don't have them right now.
The archive size is, in fact, 1 PiB (the 11 TiB mentioned above are just for the graph structure of the archive, not the actual source code files) and a decent fulltext index will be roughly the same size.
People and/or companies interested in helping us out with this are welcome!
See https://mastodon.xyz/web/@zacchiro/109038129028072927 for a more detailed answer on full text search plans.
@brainwane @zacchiro sure. For my purposes just like url plus the readme from the main branch would catch 95% of my use cases and I wouldn't mind falling back to GitHub or the less comprehensive/open code search engines when I need to go deeper. Most of the time my question is "is there a tiny open source project that does this one very particular thing before I spend a day making it"? Web search doesn't work well for that, and GitHub search is [see original thread]
@Natris1979 @brainwane Full text indexing of all content is certainly needed down the line. But "only" full text indexing README is a very interesting idea that I don't think we have explored in the past. Maybe it's something it could just fit in our current indexing infrastructure for metadata. Thanks for raising it!
@brainwane @Natris1979 on metadata search, for sure. Right now we index "package metadata" from a relatively small subset of package formats out there. We're working to extend coverage to other formats, as well as to crawl metadata from forges (e.g., github descriptions, tags, etc.). Search for "metadata" here: https://docs.softwareheritage.org/devel/roadmap/roadmap-2022.html Code contributions are welcome on that front (and the tech entry barrier should be fairly low).
A Fediverse instance for people interested in cooperative and collective projects.