OpenAlex Domains OpenAlex is a database of metadata about scholarly publishing that just had a beta release yesterday. It replaces the discontinued Microsoft Academic Graph (MAG), and is made available by the OurResearch project as a 500 GB dataset of tabular (TSV) files, that appears to be exported from an Amazon Redshift database.

500 GB is a lot to download. I guess it could be significantly improved by compressing the data first. Fortunately OurRe

@humanetech Wonderful to see that the metadata of Microsoft Academic has been rescued.

As long as it is a hard to handle heap of data, I would see it more as something to spread as news in case anyone wants to help to make it useful. Will schedule it for the Open Science Feed tomorrow.

Once it is an easy to use resource it would fit to the current Delightful Open Science list. We could think of a new list promising projects/ideas that could benefit from a helping hand.


Are you sure its 500 GB? If so perhaps the data can be reduced by a factor of around 10 from TSV to a more efficient format like parquet.

@ashwinvis i'm just relaying what it says on the download page. At the moment the data is not compressed at all, so yeah it could definitely be improved. Apparently that's how Microsoft made it available and they are keeping parity with that for now.

Sign in to participate in the conversation

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!