Data Sovereignty
Professor Tahu Kukutai FRSNZ discussed how Indigenous models of ‘good data’ provide a way forward for creating ethical, high-value, high-trust data ecosystems. Professor James Mclaurin provided an overview of artificial intelligence (AI) and how rapid changes in this area will affect the principles that guide the use of data in Aotearoa New Zealand.
Māori data sovereignty: A model of ‘good data’ for the future
Professor Tahu Kukutai FRSNZ (Ngāti Tiipa, Ngāti Māhanga, Ngāti Kinohaku, Te Aupōuri); Te Ngira: Institute for Population Research, Te Whare Wānanga o Waikato, University of Waikato; Pou Matarua, Co-director, Ngā Pae o te Māramatanga, New Zealand’s Māori Centre of Research Excellence
As every aspect of our lives becomes digitised, more and more data about us is being tracked, stored, linked, and shared. Whether we participate consciously or not, we are all part of this data revolution. It is important for us to understand the challenges and opportunities today’s avalanche of data presents. Data sovereignty is a movement that raises ethical questions about how data is governed and regulated. It asks: Whose data? Whose control? Whose values? Whose benefit?
In June 2023, Te Kāhui Raraunga released the Māori data governance model, informed by Māori data sovereignty (MDS). MDS recognises that data are a cultural, strategic, and economic resource for Indigenous peoples, and that the existing data infrastructure does not meet Māori data needs.
The Māori data governance model provides how-to guidance for making legislative and policy change. Values underpinning the model include: using data for good, valuing data as taonga, decolonising data systems, and the aim of Māori data in Māori hands. Professor Kukutai emphasised that good data governance is critical in enabling all peoples, including Māori, to flourish.
Resources:
Data Sovereignty in the Age of ChatGPT
Professor James Maclaurin, Co-director of the Centre for AI and Public Policy, Te Whare Wānanga o Otago, University of Otago
Professor Maclaurin addressed the rise of generative AI. This includes systems like ChatGPT which can generate the next page in a book, the next part of an image, frame of a video etc. This type of AI is fundamentally different to the predictive models already widely used in government and business. Crucially, generative AI is general purpose. The Large Language Models (LLMs) on which it is based, not only contain vast quantities of information, they also pass complex tests for common sense and social reasoning. Instructing LLMs via prompts requires skill and “prompt engineering” for complex contexts has now become a very high paying job. While this AI can achieve extremely high levels of factual accuracy, it is also capable of “hallucinating” incorrect responses.
Recent studies have show that white-collar workers completed complex work tasks 20% faster and with 40% higher accuracy after being introduced to ChatGPT. These people had not been trained in its use and were not using the most powerful generative AI (GPT4). Preliminary studies suggest that these advantages also appear in high risk / high reward settings. A recent study comparing the responses of physicians and ChatGPT to medical questions in an online forum found that healthcare professionals preferred ChatGPT’s responses 79% of the time, finding them to be both more accurate and more empathetic.
When building current AI systems, we address a wide variety of ethical, technical, and legal issues related to the particular purpose the system will serve. These include questions about: collection and curation of data, explainability, fairness, oversight, accuracy and performance compared to human decision-makers. Generative AI sidesteps many of these questions because it is so general. We do not train a new LLM every time we address a new problem and users have little knowledge about, and control over, the data used to train such models.
Moreover, the larger an LLM is, the greater its accuracy and reasoning ability. So this type of AI is often trained on as much information as can be gleaned from the internet. While this data is filtered, it is not curated in the way that we usually require. All LLMs are ‘aligned’ to prevent them from giving harmful or inappropriate responses. Recent studies suggest that alignment tends to make such models less accurate so there may be resistance to alignment in some contexts. As is often the case with ‘big tech’, it is difficult for small countries to influence the large companies currently driving the development of generative AI.
The talk finished with five questions for Aotearoa:
1. How can we influence the data used to train generative AI?
2. Can we influence their alignment?
3. Will we be able to effectively restrict their deployment and use?
4. Might we build our own?
5. Are there other ways we could effectively adapt to generative AI?