Standards for the Use of Artificial Intelligence in Research at Northeastern

As artificial intelligence (AI) technologies have transformed research across various fields, faculty members face new challenges and opportunities. While AI holds enormous potential to accelerate research and drive innovation, its use also raises potential ethical, academic integrity and compliance concerns. These guidelines implement the Policy on the Use of AI Systems (“AI Policy”) and are designed to make faculty aware of these issues and address them before using generative AI technologies in research at Northeastern.

Note: these Standards do not apply to the use of AI for administrative activities related to your job function, which is covered by the Standards for the Administrative Use of Artificial Intelligence at Northeastern.

Background

Generative Artificial Intelligence (AI) refers to large learning models (LLMs) that can generate high-quality text, images, and other content based on the data they were trained on, (including user-submitted text) and are designed to predict the most relevant sequence of words in response to a prompt.

In research, generative AI is often used for text summarization, writing source code, content, and idea generation, helping researchers automate tasks, analyze large datasets, and develop novel solutions across various disciplines.

Note: AI tools are only able to draw on the information that was input, meaning their responses reflect the biases and limitations of the material they have been trained on. It is important to understand that responses produced by generative AI tools will likely reflect consensus beliefs, including any biases and inaccuracies that inform those beliefs.

2. Use of AI in Research

Many researchers are already exploring the many uses of AI Systems to facilitate their research activities, including peer review, proposal generation, and reporting on research activities.

The University expects all members of the Northeastern community conducting research to follow the requirements set forth in the university AI Policy and to:

Complete the AI Review Committee review process if your research involves using an AI System to process Confidential Information, Restricted Research data or Personal Information (as defined by the AI Policy)Understand the limitations of generative AI and the dangers of relying on it as a source of information (see the section on Unlawful Bias and Discrimination Prevention below).

Follow guidelines set by the funding agency or publisher for the allowable and unallowable uses of generative AI throughout the peer review process (see the section on Using Generative AI in the Peer Review Process below).

Communicate with fellow lab and project team members about the permitted uses of generative AI on all projects based on the research activity and sponsor guidelines.

In addition to addressing the best practices and expectations set forth above, researchers must also follow the specific requirements identified below.

Authorship, Use and Citation of AI Tools

If a generative AI tool (i.e., ChatGPT) is used, your research should acknowledge how it was used, even if no generative AI content was incorporated in the work. This acknowledgement should include:

Which AI tool was used

Describe how the AI tool was used

Indicate the date AI tool was accessed

Authors who use AI tools in the writing of a paper (including any part thereof), the production of images or graphical elements for the paper, or in the collection and analysis of data must be appropriately attributed in the Materials and Methods section of the paper and may not be submitted as if it were the reported author’s own work. Failure to appropriately attribute the use of generative AI will be regarded as research misconduct and fall under the University’s Policy on Research Misconduct.

Per the University’s Policy on Research Misconduct: Research Misconduct has the same definition as under federal regulations: “fabrication, falsification, plagiarism in proposing, performing, or reviewing research, or in reporting research results.” 42 C.F.R. § 93.103. It does not include honest error or honest differences in interpretations or judgments of data.

Generative AI and Privacy

Data shared with public generative AI tools leaves the control of the researcher and the University.

As a result, if you are using proprietary or human subjects data or any other personal information, you may NOT utilize a generative AI System unless it has gone through the AI Review Committee review process and your specific use-case has been approved. In the absence of the AIRC review, such uses could violate University contract terms (such as a confidentiality agreement or DUA), the consent form under which the information what collected, or the privacy laws of the jurisdiction in which the individual resides (including FERPA & GDPR).

As used in the prior paragraph, “personal information” means any means any information relating to an individual that identifies or can reasonably be used to identify an individual, directly or indirectly (including in combination with other data), by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the identity of an individual.

Generative AI and Confidential Information, Restricted Research Data & Personal Information

Because of the potential loss of control over data submitted into AI Systems, it is important not to enter any University Confidential Information, Restricted Research Data or Personal Information (as defined by the AI Policy) into an AI System without first completing the AI Review Committee review process for your specific use-case.

This includes export-controlled information, US government Controlled Unclassified Information (I.e., CUI, CDI, SSI), or any other information covered by federal regulations. Information on export-controlled information and CUI can be found here. This also includes using AI tools to review grant proposals or peer-reviewed journal submissions/papers, which is covered in more detail below.

Please refer to the University Policy on Confidentiality of University Records and Information for additional guidance.

Using Generative AI to write grants

Generative AI may be used only if the PI understands the risks involved and adheres to the AI Policy and these Standards. PIs are responsible for signing off on the proposal and promising to do the work stated if funded. PIs should keep track of how/when they are using generative in a proposal. The PI is responsible for every part of the proposal content and should utilize generative AI in an appropriate way for their research and discipline.

Currently the NIH does not specifically prohibit the use of generative AI to write grants, but they state that the PI understands and assumes the risks of using an AI tool to help write an application.

Using AI in the Peer Review Process

Reviewers are trusted and required to maintain confidentiality throughout the application process. Therefore, you may not use AI to assist in peer review. In a recent NIH guide notice, NIH confirms their stance prohibiting NIH peer scientific peer reviewers from using natural language processers, LLMs, or other generative AI technologies for analyzing and formulating peer review critiques for grant applications and R&D contract proposals. They further state that sharing content or original concepts from an NIH grant application, contract proposal, or critique to online generative AI tools violates the NIH peer review confidentiality and integrity requirements.

3. Machine Learning in Research

Machine learning is a subset of artificial intelligence that empowers computers to learn and make predictions from data without explicit programming or human intervention. In research, it is predominantly used for data analysis, pattern recognition, and prediction in various fields, including healthcare, finance, and natural language processing, enabling insights and automation. One process commonly used to extract patterns, trends, and insights from large datasets is data mining.

When using machine learning techniques, keep the following best practices in mind: 

Ensure the data used for training and testing is high quality, clean, and representative of the research problem.

Rigorously assess the training data for biases to actively mitigate them and avoid unfair or discriminatory outcomes. This may involve data re-sampling, re-weighting, or using fairness-aware machine learning techniques to carefully select features and consider the potential impact on fairness.

Employ model-agnostic interpretability techniques and visualization tools to gain insights into how the machine learning model makes predictions.

Apply regularization techniques to prevent overfitting and use cross-validation to assess model generalization. Regularly validate the model’s performance on unseen data.

Document data sources, preprocessing steps, and algorithms are used in the data mining process to enhance transparency and reproducibility.

Ensure compliance with University, State, and Federal regulations on data management and privacy.

4. Unlawful Bias and Discrimination Prevention

As stated in the AI Policy, PIs are responsible for ensuring the absence of unlawful bias or discrimination resulting from any AI System that processes Personal Information or takes actions that may impact the legal rights or safety of an individual. NIST has identified three major categories of AI bias to be considered and managed that can exist in data sets in the absence of prejudice, partiality, or discriminatory intent:

Systemic bias can be present in AI datasets, the organization norms, practices, and processes across the AI lifecycle, and the broader society that uses AI systems.

Computational and statistical bias can be present in AI data sets and algorithmic processes; often stem from systematic errors due to non-representative samples.

Human-cognitive bias relates to how an individual or group perceives AI system information to decide or fill in missing information; or how humans think about purposes and functions of an AI system.

Bias exists in many different forms and there is no escaping the fact that they influence decisions in general day-to-day actions. As a result, biases can make their way into AI systems and therefore impact harm to individuals, groups, communities, and society. Remember, AI systems learn from what is input – if discriminatory thoughts, words, phrases, etc. are input into an AI system, it will be influenced by that bias. As stated in the university’s AI Policy, it is your responsibility to confirm that no such unlawful bias or discrimination results from your use of an AI System for research purposes.

5. Resources on Generative AI

Department of Education, “Artificial Intelligence and the Future of Teaching and Learning”

National Science Technology Council, “National Artificial Intelligence Research and Development Strategic Plan 2023 Update”

National Institute of Standards and Technology, “Artificial Intelligence Risk Management Framework”

National Science Foundation, “NSF to establish 7 new AI research institutes”

White House, OSTP, “Blueprint for an AI Bill of Rights”

Key Lessons Learned from Leading with AI, Responsibly

Executive Order 14110, “Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence”

FACT SHEET: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence

Towards a Standard for Identifying and Managing Bias in Artificial Intelligence

The Role of AI in Research Administration – Cansu Canca, The Institute for Experiential AI
“Disentangling the Components of Ethical Research in Machine Learning” – C. Ashurst, R. Campbell, S. Barocas, and I. Raji

National Institutes of Health, “Artificial Intelligence”