Skip to main content
ATAP - Australian Text Analytics Platform
23 February, 2022

AARNet partners with two universities to develop the Australian Text Analytics Platform

AARNet partnering with the University of Queensland and the University of Sydney to develop the Australian Text Analytics Platform

AARNet is proud to be a partner organisation, with the University of Queensland and the University of Sydney, to develop the Australian Text Analytics Platform (ATAP).

As part of the Australian Research Data Commons-supported Research Platforms program, the ATAP project aims to provide accessible tools and training for researchers working with large volumes of unstructured text, supported by a community of practice.

Text analytics is the process of extracting data from large volumes of unstructured text to derive machine-readable information for research purposes. Unstructured text refers to text-based material that lacks descriptive data and cannot be readily organised or defined, including for example, documents, social media and audio transcripts. The analytics process workflow provides tools to clean and organise data by introducing structure such as dates and identifying entities of interest to transform the material into information that can be understood by computers. This process enables rich insights through advanced queries, data visualisations, and preparation for machine learning.

AARNet’s CloudStor SWAN (Service for Web-based Analysis) will form a crucial component of ATAP. Based on Jupyter Notebooks, SWAN is a cloud-based workbench to write, run and share code for data analysis. AARNet will also support ATAP by delivering hands-on workshops and online training modules. SWAN was developed by CERN and compliments other projects supported by AARNet, including the Language Data Commons of Australia. AARNet’s CloudStor and SWAN assist in the publication of datasets and reproducibility of results that is important to research outputs.

The outcomes of ATAP will benefit a broad field of research where information from increasing volumes of text-based material is a valuable resource. Techniques to process and analyse unstructured text is applicable to humanities fields, engineering and the sciences. ATAP aims to transform and accelerate the data-driven research possibilities across disciplines and demonstrates AARNet’s continuing commitment to advance national research infrastructure and build data skills.