Follow

There are a few efforts afoot that are archiving the Ukrainian web. This one was started initially by members of the Slavic Digital Humanities community and is focused on cultural heritage websites:

sucho.org/

They are using some of the webrecorder.net tools (archiveweb.page, browsertrix crawler) which means they can host each website as a viewable WACZ file, which you can see start to emerge here:

sucho.org/archives

WACZ files are essentially ZIP files that contain web archive data (WARC), an index, and collection metadata.

The magic of ZIP on the web means these files can be BIG. Portions of the file can be loaded on demand (HTTP Range requests) by the ReplayWeb.page web component, which can be placed anywhere you can host HTML and a bit of JavaScript.

This is a radical departure for web archives since the archive is constituted by static files on the web, wherever they may be, and a web browser, rather than complex and difficult to maintain server side applications.

But this radical departure does raise many important questions about what it means to view archived content outside of the familiar virtual walls of the Internet Archive or an institution like the British Library. How do you know who archived the content and when? Can you prove it? Was it tampered with after they created it? These are design questions that the Webrecorder team and New Design Congress are currently working on.

@edsu if you keep a Merkle Tree with the data and send the root around you can prove that any bit belongs to the root. And the people receiving the root can vouch for the date.

@jasper but how will you know the people receiving it are not just a bunch of sock puppets?

@jasper I think you're right, having some kind of notion of consensus is important for trust, and that questions about that inevitably turn to DAGs.

I think the WACZ spec webrecorder.github.io/wacz-spe is starting out with some optional ideas of trust and authenticity, and the initial approach is to have a manifest, and sign it with a public key that people can look up and verify.

github.com/webrecorder/wacz-au

@edsu actually only meant that you can prove that a copy was a particular way at a particular time.

Since the reasons and ability to fake the data may only come after that time, it might be significant proof of authenticity in such cases...

But yes https signatures proving it was signed on the server would be very useful too. (..assuming no security breaks there..)

Sign in to participate in the conversation
social.coop

A Fediverse instance for people interested in cooperative and collective projects.