You can’t spell environmental without AI: how scientists are using LLMs

By: Dr. Gabriel Smith, lead scientist at Crowther Lab, Zurich

While news of AI and its dangers frequent our headlines, Large Language Models – otherwise known as LLMs in the tech world – can also be useful in the science world. A recent paper published by our team, led by Dr. Gabriel Smith, highlights how LLMs can be helpful for scientists – but offer a set of guidelines in our paper that can be followed to minimise the risk of harm based on our current understanding of the situation.

Ecologists use LLMs in many different ways, as ecologists do many different kinds of tasks. A common use might be for assistance with computer coding, as this is central to many of the analyses performed with large environmental datasets. Another might be for refinement of writing, if permitted by scientific journal guidelines. This can be especially helpful for ecologists who are less comfortable in English, the primary language of international science, for the time being. Learn more about how ecologists and environmental scientists are using LLMs in the following Q&A with Dr. Gabriel Smith.

Q: Can LLMs perform creative tasks in competition with scientists?

A: That depends on what you mean by “creative”. Large language models are a kind of generative artificial intelligence. The word “generative” in this phrase refers to the generation of new content. So, in that sense, everything that an LLM does is creative – it is creating something. If an LLM is used for science, then what it’s creating would otherwise probably need to be created by the scientist who is using the model. You could think of that as performing a creative task in competition with scientists, though maybe a better framing in this particular example is “in collaboration” with a scientist.

But consider that when we describe a person as “creative”, what we’re describing usually is a certain quality of the things they produce, perhaps that those things are novel and unexpected. A creative scientist probably has interesting new ideas. Sometimes those ideas can be profound enough to really fundamentally change how people understand the world. They might open up whole new areas of questioning or even ways of questioning. I am sceptical that LLMs can ever match human beings in this area but maybe I will be surprised in the future.

In short, LLMs can do a good job of creating certain kinds of things when they are asked to do so by scientists or anyone else. I don’t know whether they can effectively do the more fundamental creative task of figuring out what ought to be or what can even be created.

Q: What kind of data is used to train these models?

A: Large amounts of human-written text. Some LLM developers have not been completely transparent about what their training corpus contains. There are currently legal disputes centering on the potential inclusion of copyrighted content in particular.

Q: What steps should scientists take to ensure ethical use of these models?

A: My co-authors and I offer a set of guidelines in our paper that can be followed to minimise the risk of harm based on our current understanding of the situation. But because developments in this area are hard to predict, the most important thing is that the scientific community regularly revisit those (or any other) guidelines and revise them as things evolve. As in life more broadly, we must continually watch to see what the results of our actions are and, if those actions seem to lead to harm, resolve not to repeat them.

Q: Are there any exciting possibilities for practical applications of large language models?

A: I would imagine that LLMs have the potential to make technology much more broadly accessible since they allow computers to receive instructions and respond in ways that are conversational and thus more intuitive for people. Populations for whom sophisticated use of technology was formerly out of reach may grow increasingly able to interface comfortably and effectively with computers. I find that pretty exciting.

Q: What are the current limitations of large language models?

A: One limitation is that they tend to at times deliver false information in a very eloquent or convincing way. We give examples of this sort of “hallucination” in the paper. The problem is that the eloquence makes people assume that the response is trustworthy and if you’re not already a subject-matter expert, you might not be able to easily pick out those incorrect statements.

Q: How do these models deal with complex or contradictory information in the scientific fields?

A: On subjects where scientific debate has attracted attention beyond the academy, I think they are likely to give relatively good responses because in these cases the text in the training corpus would hopefully cover the matter more comprehensively (assuming the discussion outside of the scientific literature hasn’t been totally off the mark). However, there are many areas of science where there is complex or contradictory information, but the debate is not of general public interest and/or requires so much background knowledge as to render it basically incomprehensible to non-experts. I personally wouldn’t trust an LLM to be able to effectively explain issues like this, as that requires someone who both communicates really well and deeply understands the matter at hand. But problematically, it might try to do so anyway if someone asked.

Q: Why do you think specific guidelines are important in terms of using these models?

A: Specific guidelines are important because this kind of technology is both hugely effective and really different from what came before. That means not only that previous guidelines don’t do a good job of addressing these new tools but also that this shortcoming poses an increasingly large problem. As such, we as a scientific community now need to come to a consensus on which uses of this new technology are appropriate and which are not, ideally with some explicit justification for the decisions made. If we don’t, individual researchers will simply come to their own independent conclusions and act accordingly, which could lead to conflict down the line when areas of disagreement inevitably appear. Even if all scientists want to use LLMs ethically, we shouldn’t assume that everyone will draw the same inferences about how established, pre-AI paradigms of scientific ethics map onto our current reality. This space needs to be watched carefully.

Q: Are using these models ecologically sound in terms of resource consumption? Are there efforts to make these models more environmentally friendly in terms of resource consumption?

A: Like any computationally intensive application, LLMs use a lot of energy and resources. Right now, the United States of America uses 5% of its energy consumption on computing. The environmental footprint of the model will depend on the energy sources used to power that computation. I believe some of the cloud computing platforms used to run LLMs are transitioning towards using only carbon neutral power sources, but we don’t know about the timescale or progress of those initiatives – more information will need to be analysed and we’ll need to make sure that the environmental impact is limited moving forward.

In conclusion, because developments in this area are hard to predict, the most important thing is that the scientific community regularly revisit those (or any other) guidelines and revise them as things evolve. As in life more broadly, we must continually watch to see what the results of our actions are and, if those actions seem to lead to harm, resolve not to repeat them.

Read the paper here: https://doi.org/10.1371/journal.pcbi.1011767