After a few nightly build hiccups we've got the final 2021 EIA 860/923 #data deployed as of this morning (with the last 20 years of data too!). Hundreds of pages of spreadsheets were shredded with #Python to make this #SQLite DB:
https://data.catalyst.coop/pudl
Many thanks to @simon for all his hard work on #datasette, which we use to publish the data.
The 2021 #FERC Form 1 data has been more recalcitrant, since they've switched to using #XBRL for reporting (after 27 years of Visual FoxPro...), but we're close!
It looks like there's enough structured information in the XBRL taxonomies that we can reproduce all the calculations and tag the data with the relevant FERC accounting categories.
The last hard thing is trying to generalize the way we reshape the FERC data, which typically comes in a wide format (like... 500 columns sometimes) into #TidyData that's more relational.
We do have a nice way to concatenate the old DBF and new XBRL data, which also aligns all of the old data, whose row numbers changed meaning from year to year as new fields were added, split, or removed.