Skip to main content
Graham Brown Field Diary
10 August, 2016

Big data challenges flagged at Digital Humanities Australasia

Rethinking research access to cultural data was a hot topic at the Digital Humanities Australasia conference held in Hobart recently. Representatives from the GLAM (galleries, libraries, archives and museums) sector participated in a lively panel discussion, convened by AARNet, that examined a range of current issues around digital tools and resources and access to data for research purposes.

Opportunities for collaboration

Digital humanities research is an emerging field, and with many of the nation’s GLAMs now connected to the AARNet research network infrastructure, along with the universities, there are huge opportunities for collaboration between the humanities research and cultural collecting communities that previously did not exist. The panel discussion proved most useful for bringing the issues to light and kick starting the process of finding new ways to work together.

Data Bridges

Using research network infrastructure to build ‘data bridges’ between GLAMs and the universities where humanities researchers are located is an approach currently being explored by the GLAM community in collaboration with AARNet. The aim is to find ways to build capacity for digital humanities research and sustain the value of GLAM data repositories into the future.

Key takeaways from the panellists:

Challenges ahead for the research community

International trends and growing interest in the use of selective and whole domain web archives for contemporary historical research are propelling efforts to find sustainable ways of making cultural data more easily accessible to researchers.

PANDORA (Preserving and Accessing Networked Documentary Resources of Australia), Australia’s Web Archive, is a growing collection of selected Australian web publications, established initially by the National Library of Australia in 1996, and now built in collaboration with nine other Australian libraries and cultural collecting organisations. The Library has also, with the assistance of the Internet Archive, been regularly harvesting Australia’s web domain for nearly seven years.

Dr Marie-Louise Ayres (Assistant Director-General, National Collections Access, National Library of Australia) laid out the data access challenge ahead for the research community:

“This web data for the Library’s archive is shipped to us in big red boxes, we’re talking Petabytes of big content that (researchers) are going to need in the future”. Ayres outlined the problem, that Australian web data needs to be made easily accessible to researchers, so that a wide range of research opportunities can be unleashed.

JISC, the company that operates JANET, AARNet’s sister research and education network in the UK, is funding a Big Data project, which is being undertaken by the Oxford Internet Institute. The project is a world leading exploration into the research value of web archives for UK social science research.

Intelligent search

With the State Library of New South Wales Mitchell Collection, comprising 40,000 volumes of 19th century printed Australian history, slated to be digitized, Richard Neville, the Library’s Mitchell Librarian and Director Education and Scholarship, raised an interesting question.

“If you are going to digitize a collection, how do you actually provide an intelligent search across this huge corpus of fairly unstructured data to unlock all the research potential,” he asked.

Researchers may still want to read the digitized items in the traditional way, page by page, but they may also want to look at them en masse, as a corpus. Computational methods are now being used to explore elements in the text, for example, how language usage changes and when a word appears or disappears over time. Research conducted through the Prosecution Project, led by Mark Finnane, at Griffith University, for example, draws upon a corpus of unstructured text, from digitized archives, and employs both traditional and computational methods.

State, territory and national libraries are digitizing large corpuses of unstructured data and the issue of data access for research, is looming large.

Citizen humanities

Janet Carding, Director, Tasmanian Museum and Art Gallery, suggests that the process of making collections available to a global audience through web portals and other digital mechanisms should be conceived as a form of “citizen humanities”.

If you think of citizen science, which draws the community into participating in science by helping to track data, the same approach can apply to humanities through transcription and hacking competitions. Cultural institutions have a lot of material in their collections that can help society unlock the past, reflect on the future and understand the diverse cultures and societies. By transcribing and encoding handwritten World War I diaries or naturalists’ field diaries, for example, citizen volunteers thereby enable political, social and scientific historical research.

The Powerhouse Museum’s Hack the Collection program is a great example of a citizen humanities program. For Hack the Collection 10 objects from the Powerhouse Museum collection were 3D scanned and made available to the public to remix, reinterpret and create new works.

“Museums and galleries provide a great site for investigating methodologies around the digital humanities,” said Carding.

Interpretation, rights and reuse of data

Dr Ely Wallis, Manager, Research and Online Collections, Museum Victoria says that GLAM collections can be interpreted in so many different ways.

A scientific researcher’s interest in the Museum Victoria field diaries collection can be from a taxonomists point of view. They may be interested in observations of species found in certain places. A humanities researcher on other hand may be interested in the field diaries as a record of Australia’s social history, a window to what life was like at that time. Different interpretations are important for furthering our understanding of the world around us.

“GLAM collections contain multiple narratives, and there are multiple ways, and multiple biases, in what we collect, and how we collect and whose voices are represented,” she said.

Wallis also raised the issue of unlocking data by developing new standard for rights and reuse in order to make the wealth of heritage material available online so that as many people as possible can easily use it in many different ways.

The joint initiative between the Digital Public Library of America and Europeana,, was cited as an example of valuable work that has already been done to raise awareness about sorting out rights metadata in order to support heritage data reuse by researchers. These large-scale discovery services have opened the door to extensive research opportunities by making openly available millions of items from galleries, libraries, archives and museums, and cultural heritage sites discoverable. The next step is to make as much of that heritage data accessible and reusable.