Red-teaming

Process of intentionally testing an AI model with malicious or tricky prompts to uncover vulnerabilities and improve robustness.

Intermediate seguridad pruebas ataques etica

Full definition

Process of intentionally testing an AI model with malicious or tricky prompts to uncover vulnerabilities and improve robustness.

Internal team trying to «hack» the corporate chatbot to make it reveal sensitive data or generate forbidden content.