I see no more important current area of research for humanity than AI Safety. The power and dangers of AI have become clear to most given ChatGTP, DALL-E, Twitter bots, polarizing recommendation systems, and potential for swaying elections as well as more existential considerations. The four points listed below are pivotal areas of research for AI safety.

Alignment and Superalignment

Ensuring that AI's objectives maintain aligned with those of the greater good of humanity can be subtlely difficult to achieve. I plan to study weak-to-strong generalization which is a research effort introduced by OpenAI which asks how a weak agent providing an incomplete or innacurate label to data can be combined with a strong agent to improve performance.

Governance and Legislation

This is an important time for lawmakers and the general public to be aware of the potential benefits of AI and the non-partisan risks AI provides. This is not a simple issue however since ownership and responsibility are non-trivial to determine especially as models are translated for non-intended use cases. Regardless, independent auditors can enforce proper conformability and safety checks from the party providing the initial model while safeguards for model leaking and transferrability should be dynamically changing with the dynamic nature of these systems. See this paper from Lav for more on these considerations. The recent executive order is the first major action taken by the U.S. to govern AI.

Robustness and Reliability

Robust systems continue to function properly in the presence of invalid inputs or unexpected conditions. Reliable systems perform their functions under stated conditions with infrequent or non-existent errors. Systems which are robust and reliable give assurance to users. It is worth noting that the robustness and reliability requirements will change depending on the criticality of the task.

Explainability and Interpretability

As these models become more complex and powerful, it is important that there is an explanation for the decisions made by the model (explainability). Furthermore, it would be even better to understand the inner workings of the models (interpretability).