Authorship and involuntary attribution: how and why should we contest AI manipulation?

By Kristin Bergtora Sandvik - 29 October 2024

Authorship and involuntary attribution: how and why should we contest AI manipulation?

Technology is radically changing the work and role of scholars and the function of academic publishing. Fake and fabricated content (data, facts, arguments, claims, conclusions) undermines the foundations of knowledge in a democratic society. Faking publications and citations undermine the way we as a community of knowledge producers have organized ourselves and make sense of our work. In the broader field of humanitarian studies, including forced migration, mass violence, and war, AI manipulation generates acute questions regarding knowing about life and death, including the trustworthiness and reliability of academics as experts, commentators, and analysts. This blog deals with involuntary attribution and reflects on what kind of academic dispute it gives rise to, how it might be mitigated, and why we should care.

Introduction

Like many of my colleagues, as a mid-career researcher, I have begun to rack up a long list of weird experiences in the authorship and citation department. These range from complications around the titling of work to offers of payment for citations. More recently, however, the issues arising are increasingly not about dealing with attempts to appropriate what’s mine but about dealing with attempts to attribute to me what's not mine. This concerns false citations generated by generative AI.

Recently, I was browsing an edited volume online and I was interested in how the author had used my work. Amid a couple of recognizable references, I was puzzled when I saw something that felt uncannily familiar but which I could not specifically identify. Noting the year of the publication as well as the quotes in the text, I went to the bibliography. There, I found a fake journal article – with a title I could have used but had not –published in a well-known and respected journal in my field, complete with year, issue, volume number, and page range. When I googled this, the reference showed up as part of the book chapters' bibliography and was listed as cited by 1 on Google Scholar. There she/it was. False-Sandvik #1.

While wrongful attribution is an old challenge for academia, the potentially cascading effects of relatively simple transgressions such as using wrong or non-existing citations represent something new. On one hand, the scale of the problem is intricately interlinked with the changing structure of academic publishing: whereas editors, publishers, and journals used to be gatekeepers against irregularities, this gatekeeping role is being challenged on all fronts. Conversely, the pervasiveness of AI means that remedies are getting scarce: an erratum does not remove the existence of a free-floating digital reference. In light of these developments, there is significant ambiguity with respect to how researchers – no matter their career stage or positionality – may defend their academic record. In short: do we have any way of controlling what sticks to us?

In the following, I lay out some of the dilemmas arising with involuntary attribution, what kind of dispute this is, and how we might deal with it; before reflecting on how the potential impact of involuntary attribution corresponds to the situatedness of the affected researcher. I draw on dialogue with and input from colleagues, in part quoting their reflections.

This is not mine: making sense of forgery

In academic writing, as of 2023, most journals in the social sciences had developed AI policies and required authors to submit AI statements disclosing whether they had used AI-assisted technologies. What appears as a central theme is the responsibility of the human author for any submitted material generated by AI tools, including correctness and copyright infringement. By 2024, this requirement ought to be well-known and mainstream.

As teachers, thesis supervisors, and reviewers, many of us are deeply frustrated by the scope and breadth of falsification spreading through student submissions and academic work. While many of these attributions are to Dumbledore-style authors – aka not real – there is also a trail of falsehoods leading to real academics.

My first hunch when I encountered False-Sandvik was that this was so familiar-sounding that I needed to see if it was a forgotten but real paper. This is the trick; generative AI doesn’t hallucinate or ‘lie’ but delivers approximations of what it believes to be correct based on input training data. My entire publication record is part of this body of training data: GenAI being inherently predatory, all published academics are part of this experiment. Thinking that perhaps I had an unpublished conference paper somewhere that I had circulated I undertook some frantic searching in all outboxes and folders, which yielded nothing. I also wrote the journal and asked, very confused, whether they thought this existed and if I had possibly reviewed a paper with such a title, thinking up the possibility that the review could have leaked. The journal confirmed that the journal had published no article with such title, nor any article by me. The page range is not actual. The quotes were real but from a different paper written together with a colleague.

Upon my request, one of the editors of the book then contacted the chapter author who claimed to not know where the reference was from. Previously, I would have been ready to accept that the chapter author just mixed up something during the writing process and that the mix-up was not spotted during the editing process. Yet, today, this explanation does not account for the detailed and non-existent references nor the multiple and presumably intentional in-text references to a non-existing article. Nobody seems to have felt responsible for preventing this happening – or perhaps we as a scholarly community are just not prepared for it. This has led me to ponder what kind of academic dispute this is. How seriously should I take this? How seriously should we all take this? What are the possible courses of action and mitigation strategies?

What to do with fake citations

As should already be clear, I suggest we all take this quite seriously. We need to stand our ground. Yet, how? I don’t know what to actually do with the present and future False-Sandviks. As a way of trying to come up with something actionable, I want to bring up a set of dilemmas and questions that I see arising out of this.

First, the obvious. I identified one instance of forgery. What if it was 10 or 20 and they were not so innocent but rather intent on skewing the optics of my profile in one direction or another? While for me there is a difference between 1 and 20, is there a principled difference concerning the dilemmas that arise and my options for avoiding this sticking?

As noted above, authors are now frequently required to sign AI statements and fake citations constitute misconduct. Yet, even getting the author to admit that GenAI was used doesn’t solve the problem, which is the continued existence of this citation online.

I have the right not to get this attributed to me. If I don’t react publicly, I am concerned about later being accused of faking my own contributions. How can anyone – in particular, someone who is early career and doesn’t have a preexisting academic digital body – prove it wasn’t them trying to bump up their publication record or citation count?

Yet, any public disavowal of the particular instance will have repercussions for the author. What do I owe them? What do they owe me? In a sense, even if I filed a formal complaint (which I would have done if the reference was less innocent sounding or if there were more than one), this will not fix things in the sense that the false article is now out there with my name on. Any citation of the chapter with False-Sandvik will amplify the reach of this forgery, likely also increasing the citation count for False-Sandvik (very appealing in its algorithmic simplicity, this fake paper title).

Then it’s the journal, which clearly has the right to police claims about having published with them. How do journals generally engage with these issues? Do journals have any duties with respect to involuntary attributions? Should journals have a general disclaimer, reach out to offending authors or perhaps put out an annual blacklist of non-existing articles? Would that type of work even be feasible?

There are also the rights of the co-author of the quotations that were lifted from an unacknowledged paper. Their work is now being used without appropriate acknowledgment – which counts in the citation economy –but also being linked to a fraudulent citation.

On the other side of the table are the editor, peer reviewers, and the publisher. Formally, the responsibility for my case rests with the editor (s). Yet, as noted by a colleague, an experienced editor and articulate public speaker on the subject, who has the means or tools to fulfill the obligation of checking every reference, especially when it becomes highly skilled detective work? At present, the pressure is on academic publishing to publish more and faster. This seems opposed to the increased level of scrutiny necessary to detect AI-related fraud. As I was writing this blog, I undertook a peer review where I included a vaguely ridiculous-sounding disclaimer, emphasizing that as a peer reviewer, I didn’t have the capacity to go through the bibliography to make sure it was real (I tried and stopped after five minutes). Can we be expected to be held responsible for a task we cannot be expected to carry out? All of this inevitably points towards the obligations of publishers to do more to uphold academic integrity – their own and ours. Yet, doing more is also likely to rely on AI tools and so the cycle of introducing new digital tools that give rise to new appropriation and forgery practices continues.

The impact on scholars and scholarship

This latter rationale for taking involuntary attribution very seriously is that to continue our work, we at least need to try to understand what is going on and what the ramifications of individualized AI manipulation may be. In the current geopolitical climate, not only has research on specific issues become highly politicized and subject to controversies, threats, and conspiracies (think COVID) – but researchers themselves are experiencing new types of challenges concerning their reputation, the integrity of their research, and their participation in the academic community.

As I was soliciting feedback from colleagues on this blog, my editor colleague noted that there was an enhanced risk in the first place for senior academics because they had a name to steal: There is a dilemma missing in your list that you ponder in this draft: cases where the 'hallucinated' paper has actually been published. This, she suggested, pointed to a future where the rise of papermills continue, real academic publish fake papers and papers are falsely attributed to genuine academics. Does this mean, she wondered, that ‘just proving that your paper wasn’t published in that journal may soon be the 'lucky' scenario’?

For all of us, involuntary attributions can carry practical costs, such as denial of access to fieldwork – for example, because a false citation could suggest you are a security risk, unethical, or politically engaged in a way that provokes local authorities. It could also carry costs for an organization (such as a minority rights NGO or the ICRC) or an institution associated with you. Yet, the impact of false attributions depends on how you are situated. I was able to elicit an answer from the journal, the editor, and the author in a very short time. I can freely articulate my viewpoints and publish this blog. Even as False-Sandvik sticks around, at least in this presumptively harmless shape, I can call it by its name.

Yet, the costs will likely not be distributed equally. For academics with minority status, wherever they are, or researchers doing work on highly contested political topics (such as Russia- Ukraine or China) or who are coming from an area of conflict (and I am here thinking particularly about students and academics with Middle Eastern/Palestinian background) or who have a precarious employment status, the loss of control of one’s own name and emergent academic portfolio may have unforeseeable and unmitigable consequences for opportunities to access grants, publication opportunities or invitations. This calls for saying something.

I want to end with the comments from a colleague who is a fierce promoter of rigorous academic writing: ‘You ask what kind of academic dispute this is: for me, it is falsification of data. It doesn’t matter whether it comes from AI or the person’s head… it is less a crime against you than it is a crime against scholarship. Even if you are OK with it, the rest of us should not be.’

This blog is produced as part of the strategic initiative ‘Artificial Intelligence, Humanitarian Ideas and Discourse – KnowingAID’ Led by Maria Gabrielsen Jumbert May 2024 – Dec 2024

I am grateful to Lynn Nygaard, Maria Gabrielsen Jumbert, Per Jørgen Ystehede, Marit Moe-Pryce, Beata Paragi, Kjersti Lohne, Kristoffer Líden, Maja Janmyr, Mareile Kaufman, Rod Mena and Ozlem Gürakar-Skribeland for their input and encouragement. Responsibility for the content rests solely with me.

Kristin Bergtora Sandvik (S.J.D Harvard Law School 2008) is a professor of legal sociology at the Faculty of Law, University of Oslo, and a Research Professor in Humanitarian Studies at PRIO. Sandvik’s research on the humanitarian sector focuses on refugee resettlement, legal mobilization, technology, innovation, and accountability. Her new book, Humanitarian Extractivism: The digital transformation of aid is published by Manchester University Press.

Photo by Pixabay