Robots and other autonomous agents are increasingly being used in domains such as space operations, cyber defense, disaster response, and medical care, and are envisioned to directly collaborate with human partners in ways that resemble human-human teams. But, all human communities, groups, and teams have norms that influence and regulate behavior, so autonomous agents that join these communities must be responsive to norms as well—they must know and follow the norms of their community. But even if we succeed in giving autonomous agents such norm competence, we are faced with a significant challenge: Norms can conflict with each other. Whenever an agent (human or machine) resolves a norm conflict, it must commit a norm violation. People respond to such violations with moral disapproval and loss of trust. In this project we investigate one powerful tool humans use—and autonomous agents should use—to mitigate such moral disapproval and repair lost trust: justifications. When an agent must violate a norm in order to resolve a norm conflict, a justification explains why the agent did act in this way and why anybody that shares the community’s norms should act in this way. In a series of experiments, we will demonstrate that, after resolving a norm conflict and committing a norm violation, an autonomous agent that justifies its actions—similar to a human who does so—will reduce the moral disapproval and repair the loss of trust that normally results from norm violations.
This work is sponsored by the U.S. Air Force Office for Scientific Research