The post OpenAI Releases Double-Checking Tool For AI Safeguards That Handily Allows Customizations appeared on BitcoinEthereumNews.com. AI developers need to double-check their proposed AI safeguards and a new tool is helping to accomplish that vital goal. getty In today’s column, I examine a recently released online tool by OpenAI that enables the double-checking of potential AI safeguards and can be used for ChatGPT purposes and likewise for other generative AI and large language models (LLMs). This is a handy capability and worthy of due consideration. The idea underlying the tool is straightforward. We want LLMs and chatbots to make use of AI safeguards such as detecting when a user conversation is going afield of safety criteria. For example, a person might be asking the AI how to make a toxic chemical that could be used to harm people. If a proper AI safeguard has been instituted, the AI will refuse the unsafe request. OpenAI’s new tool allows AI makers to specify their AI safeguard policies and then test the policies to ascertain that the results will be on target to catch safety violations. Let’s talk about it. This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here). The Importance Of AI Safeguards One of the most disconcerting aspects about modern-day AI is that there is a solid chance that AI will say things that society would prefer not to be said. Let’s broadly agree that generative AI can emit safe messages and also produce unsafe messages. Safe messages are good to go. Unsafe messages ought to be prevented so that the AI doesn’t emit them. AI makers are under a great deal of pressure to implement AI safeguards that will allow safe messaging and mitigate or hopefully prevent unsafe messaging by their LLMs. There is a… The post OpenAI Releases Double-Checking Tool For AI Safeguards That Handily Allows Customizations appeared on BitcoinEthereumNews.com. AI developers need to double-check their proposed AI safeguards and a new tool is helping to accomplish that vital goal. getty In today’s column, I examine a recently released online tool by OpenAI that enables the double-checking of potential AI safeguards and can be used for ChatGPT purposes and likewise for other generative AI and large language models (LLMs). This is a handy capability and worthy of due consideration. The idea underlying the tool is straightforward. We want LLMs and chatbots to make use of AI safeguards such as detecting when a user conversation is going afield of safety criteria. For example, a person might be asking the AI how to make a toxic chemical that could be used to harm people. If a proper AI safeguard has been instituted, the AI will refuse the unsafe request. OpenAI’s new tool allows AI makers to specify their AI safeguard policies and then test the policies to ascertain that the results will be on target to catch safety violations. Let’s talk about it. This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here). The Importance Of AI Safeguards One of the most disconcerting aspects about modern-day AI is that there is a solid chance that AI will say things that society would prefer not to be said. Let’s broadly agree that generative AI can emit safe messages and also produce unsafe messages. Safe messages are good to go. Unsafe messages ought to be prevented so that the AI doesn’t emit them. AI makers are under a great deal of pressure to implement AI safeguards that will allow safe messaging and mitigate or hopefully prevent unsafe messaging by their LLMs. There is a…

OpenAI Releases Double-Checking Tool For AI Safeguards That Handily Allows Customizations

AI developers need to double-check their proposed AI safeguards and a new tool is helping to accomplish that vital goal.

getty

In today’s column, I examine a recently released online tool by OpenAI that enables the double-checking of potential AI safeguards and can be used for ChatGPT purposes and likewise for other generative AI and large language models (LLMs). This is a handy capability and worthy of due consideration.

The idea underlying the tool is straightforward. We want LLMs and chatbots to make use of AI safeguards such as detecting when a user conversation is going afield of safety criteria. For example, a person might be asking the AI how to make a toxic chemical that could be used to harm people. If a proper AI safeguard has been instituted, the AI will refuse the unsafe request.

OpenAI’s new tool allows AI makers to specify their AI safeguard policies and then test the policies to ascertain that the results will be on target to catch safety violations.

Let’s talk about it.

This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

The Importance Of AI Safeguards

One of the most disconcerting aspects about modern-day AI is that there is a solid chance that AI will say things that society would prefer not to be said. Let’s broadly agree that generative AI can emit safe messages and also produce unsafe messages. Safe messages are good to go. Unsafe messages ought to be prevented so that the AI doesn’t emit them.

AI makers are under a great deal of pressure to implement AI safeguards that will allow safe messaging and mitigate or hopefully prevent unsafe messaging by their LLMs.

There is a wide range of ways that unsafe messages can arise. Generative AI can produce so-called AI hallucinations or confabulations that tell a user to do something untoward, but the person assumes that the AI is being honest and apt in what has been generated. That’s unsafe. Another way that AI can be unsafe is if an evildoer asks the AI to explain how to make a bomb or produce a toxic chemical. Society doesn’t want that type of easy-peasy means of figuring out dastardly tasks.

Another unsafe angle is for AI to aid people in concocting delusions and delusional thinking, see my coverage at the link here. The AI will either prod a person into conceiving of a delusion or might detect that a delusion is already on their mind and aid in embellishing the delusion. The preference is that AI provides upside mental health advice over downside mental health guidance.

Devising And Testing AI Safeguards

I’m sure you’ve heard the famous line that you ought to try it before you buy it, meaning that sometimes being able to try out an item is highly valuable before making a full commitment to the item. The same wisdom applies to AI safeguards.

Rather than simply tossing AI safeguards into an LLM that is actively being used by perhaps millions upon millions of people (sidenote: ChatGPT is being used by 800 million weekly active users), we’d be smarter to try out the AI safeguards and see if they do what they are supposed to do.

An AI safeguard should catch or prevent whatever unsafe messages we believe need to be stopped. There is a tradeoff involved since an AI safeguard can become an overreach. Imagine that we decide to adopt an AI safeguard that prevents anyone from ever making use of the word “chemicals” because we hope to avoid allowing a user to find out about toxic chemicals.

Well, denying the use of the word “chemicals” is an exceedingly bad way to devise an AI safeguard. Imagine all the useful and fair uses of the word “chemicals” that can arise. Here’s an example of an innocent request. People might be worried that their household products might contain adverse chemicals, so they ask the AI about this. An AI safeguard that blindly stopped any mention of chemicals would summarily turn down that legitimate request.

The crux is that AI safeguards can be very tricky when it comes to writing them and ensuring that they do the right things (see my discussion on this, at the link here). The preference is that an AI safeguard stops the things we want to stop, but doesn’t go overboard and stop things that we are fine with having proceed. A poorly devised AI safeguard will indubitably produce a vast number of false positives, meaning that it will stop an otherwise upside and allowable action.

If possible, we should try out any proposed AI safeguards before putting them into active action.

Using Classifiers To Help Out

There are online tools that can be used by AI developers to assist in classifying whether a given snippet of text is considered safe versus unsafe. Usually, these classifiers have been pretrained on what constitutes safety and what constitutes being unsafe. The beauty of these classifiers is that an AI developer can simply feed various textual content into the tool and see which, if any, of the AI safeguards embedded into the tool will react.

One difficulty is that those kinds of online tools don’t necessarily allow you to plug in your own proposed AI safeguards. Instead, the AI safeguards are essentially baked into the tool. You can then decide whether those are the same AI safeguards you’d like to implement in your LLM.

A more accommodating approach would be to allow an AI developer to feed in their proposed AI safeguards. We shall refer to those AI safeguards as policies. An AI developer would work with other stakeholders and come up with a slate of policies about what AI safeguards are desired. Those policies then could be entered into a tool that would readily try out those policies on behalf of the AI developer and their stakeholders.

To test the proposed policies, an AI developer would need to craft text to be used during the testing or perhaps grab relevant text from here or there. The aim is to have a sufficient variety and volume of text that the desired AI safeguards all ultimately get a chance to shine in the spotlight. If we have an AI safeguard that is proposed to catch references to toxic chemicals, the text that is being used for testing ought to contain some semblance of references to toxic chemicals; otherwise, the testing process won’t be suitably engaged and revealing about the AI safeguards.

OpenAI’s New Tool For AI Safeguard Testing

In a blog posting by OpenAI on October 29, 2025, entitled “Introducing gpt-oss-safeguard”, the well-known AI maker announced the availability of an AI safeguard testing tool:

  • “Safety classifiers, which distinguish safe from unsafe content in a particular risk area, have long been a primary layer of defense for our own and other large language models.”
  • “Today, we’re releasing a research preview of gpt-oss-safeguard, our open-weight reasoning models for safety classification tasks, available in two sizes: gpt-oss-safeguard-120b and gpt-oss-safeguard-20b.”
  • “The gpt-oss-safeguard models use reasoning to directly interpret a developer-provided policy at inference time — classifying user messages, completions, and full chats according to the developer’s needs.”
  • “The model uses chain-of-thought, which the developer can review to understand how the model is reaching its decisions. Additionally, the policy is provided during inference, rather than being trained into the model, so it is easy for developers to iteratively revise policies to increase performance.”

As per the cited indications, you can use the new tool to try out your proposed AI safeguards. You provide a set of policies that represent the proposed AI safeguards, and also provide whatever text is to be used during the testing. The tool attempts to apply the proposed AI safeguards to the given text. An AI developer then receives a report analyzing how the policies performed with respect to the provided text.

Iteratively Using Such A Tool

An AI developer would likely use such a tool on an iterative basis.

Here’s how that goes. You draft policies of interest. You devise or collect suitable text for testing purposes. Those policies and text get fed into the tool. You inspect the reports that provide an analysis of what transpired. The odds are that some of the text that should have triggered an AI safeguard did not do so. Also, there is a chance that some AI safeguards were triggered even though the text per se should not have set them off.

Why can that happen?

In the case of this particular tool, a chain-of-thought (CoT) explanation is being provided to help ferret out the culprit. The AI developer could review the CoT to discern what went wrong, namely, whether the policy was insufficiently worded or the text wasn’t sufficient to trigger the AI safeguard. For more about the usefulness of chain-of-thought in contemporary AI, see my discussion at the link here.

A series of iterations would undoubtedly take place. Change the policies or AI safeguards and make another round of runs. Adjust the text or add more text, and make another round of runs. Keep doing this until there is a reasonable belief that enough testing has taken place.

Rinse and repeat is the mantra at hand.

Hard Questions Need To Be Asked

There is a slew of tough questions that need to be addressed during this testing and review process.

First, how many tests or how many iterations are enough to believe that the AI safeguards are good to go? If you try too small a number, you are likely deluding yourself into believing that the AI safeguards have been “proven” as ready for use. It is important to perform somewhat extensive and exhaustive testing. One means of approaching this is by using rigorous validation techniques, as I’ve explained at the link here.

Second, make sure to include trickery in the text that is being used for the testing process.

Here’s why. People who use AI are often devious in trying to circumvent AI safeguards. Some people do so for evil purposes. Others like to fool AI just to see if they can do so. Another perspective is that a person tricking AI is doing so on behalf of society, hoping to reveal otherwise hidden gotchas and loopholes. In any case, the text that you feed into the tool ought to be as tricky as you can make it. Put yourself into the shoes of the tricksters.

Third, keep in mind that the policies and AI safeguards are based on human-devised natural language. I point this out because a natural language such as English is difficult to pin down due to inherent semantic ambiguities. Think of the number of laws and regulations that have loopholes due to a word here or there that is interpreted in a multitude of ways. The testing of AI safeguards is slippery because you are testing on the merits of human language interpretability.

Fourth, even if you do a bang-up job of testing your AI safeguards, they might need to be revised or enhanced. Do not assume that just because you tested them a week ago, a month ago, or a year ago, they are still going to stand up today. The odds are that you will need to continue to undergo a cat-and-mouse gambit, whereby AI users are finding insidious ways to circumvent the AI safeguards that you thought had been tested sufficiently.

Keep your nose to the grind.

Thinking Thoughtfully

An AI developer could use a tool like this as a standalone mechanism. They proceed to test their proposed AI safeguards and then subsequently apply the AI safeguards to their targeted LLM.

An additional approach would be to incorporate this capability into the AI stack that you are developing. You could place this tool as an embedded component within a mixture of LLM and other AI elements. A key aspect will be the proficiency in running, since you are now putting the tool into the stream of what is presumably going to be a production system. Make sure that you appropriately gauge the performance of the tool.

Going even further outside the box, you might have other valuable uses for a classifier that allows you to provide policies and text to be tested against. In other words, this isn’t solely about AI safeguards. Any other task that entails doing a natural language head-to-head between stated policies and whether the text activates or triggers those policies can be equally undertaken with this kind of tool.

I want to emphasize that this isn’t the only such tool in the AI community. There are others. Make sure to closely examine whichever one you might find relevant and useful to you. In the case of this particular tool, since it is brought to the market by OpenAI, you can bet it will garner a great deal of attention. More fellow AI developers will likely know about it than would a similar tool provided by a lesser-known firm.

AI Safeguards Need To Do Their Job

I noted at the start of this discussion that we need to figure out what kinds of AI safeguards will keep society relatively safe when it comes to the widespread use of AI. This is a monumental task. It requires technological savviness and societal acumen since it has to deal with both AI and human behaviors.

OpenAI has opined that their new tool provides a “bring your own policies and definitions of harm” design, which is a welcome recognition that we need to keep pushing forward on wrangling with AI safeguards. Up until recently, AI safeguards generally seemed to be a low priority overall and given scant attention by AI makers and society at large. The realization now is that for the good and safety of all of us, we must stridently pursue AI safeguards, else we endanger ourselves on a massive scale.

As the famed Brigadier General Thomas Francis Meagher once remarked: “Great interests demand great safeguards.”

Source: https://www.forbes.com/sites/lanceeliot/2025/11/04/openai-releases-double-checking-tool-for-ai-safeguards-that-handily-allows-customizations/

Market Opportunity
Sleepless AI Logo
Sleepless AI Price(AI)
$0.03857
$0.03857$0.03857
+3.51%
USD
Sleepless AI (AI) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Top political stories of 2025: The Villar family’s business and political setbacks

Top political stories of 2025: The Villar family’s business and political setbacks

Rappler's Dwight de Leon recaps the challenges faced in 2025 by one of the Philippines' wealthiest families
Share
Rappler2025/12/25 09:00
Nvidia Absorbs Another Rival for $20B, Boosting Decentralized AI

Nvidia Absorbs Another Rival for $20B, Boosting Decentralized AI

The post Nvidia Absorbs Another Rival for $20B, Boosting Decentralized AI appeared on BitcoinEthereumNews.com. NVIDIA has agreed to pay approximately $20 billion
Share
BitcoinEthereumNews2025/12/25 09:16
Pibble AI platform: Revolutionary AION Completes POSCO International POC with Stunning Success

Pibble AI platform: Revolutionary AION Completes POSCO International POC with Stunning Success

BitcoinWorld Pibble AI platform: Revolutionary AION Completes POSCO International POC with Stunning Success The world of trade is constantly evolving, with businesses seeking innovative solutions to enhance efficiency and accuracy. In this dynamic landscape, the Pibble AI platform AION has emerged as a groundbreaking force, recently completing a significant Proof-of-Concept (POC) with global trading giant POSCO International. This achievement signals a major leap forward in how artificial intelligence and blockchain technology can revolutionize B2B operations. What is the Pibble AI Platform AION and Its Recent Breakthrough? AION is an advanced AI trade solution developed by Caramel Bay, the innovative operator behind the Pibble (PIB) blockchain project. Its core mission is to streamline complex trade processes, which traditionally involve extensive manual labor and time-consuming documentation. The recent POC with POSCO International was a pivotal moment for the Pibble AI platform. It served as a real-world test, demonstrating AION’s capabilities in a demanding corporate environment. This collaboration showcased how cutting-edge technology can address practical business challenges, particularly in international trade. The results were truly impressive. The platform proved its ability to drastically cut down the time required for specific tasks. What once took hours of meticulous work can now be completed in mere minutes. Moreover, AION achieved an astonishing document accuracy rate of over 95%, setting a new benchmark for efficiency and reliability in trade operations. This high level of precision is crucial for reducing errors and associated costs in large-scale international transactions. Revolutionizing Trade: How the Pibble AI Platform Delivers Speed and Accuracy Imagine reducing hours of work to just minutes while simultaneously boosting accuracy. This isn’t a futuristic fantasy; it’s the tangible reality delivered by the Pibble AI platform AION. The successful POC with POSCO International vividly illustrates the transformative power of this technology. Key benefits highlighted during the POC include: Unprecedented Speed: Tasks that typically consumed significant human resources and time were executed with remarkable swiftness. This acceleration translates directly into faster transaction cycles and improved operational flow for businesses. Superior Accuracy: Achieving over 95% document accuracy is a monumental feat in an industry where even minor errors can lead to substantial financial losses and logistical nightmares. AION’s precision minimizes risks and enhances trust in digital documentation. Operational Efficiency: By automating and optimizing critical trade processes, the Pibble AI platform frees up human capital. Employees can then focus on more strategic tasks that require human intuition and decision-making, rather than repetitive data entry or verification. This efficiency isn’t just about saving time; it’s about creating a more robust, less error-prone system that can handle the complexities of global trade with ease. The implications for businesses involved in import/export, logistics, and supply chain management are profound. Beyond the POC: Pibble’s Vision for AI and Blockchain Integration The successful POC with POSCO International is just one step in Pibble’s ambitious journey. The company is dedicated to building validated platforms that leverage both blockchain and AI technologies, catering to a broad spectrum of needs. Pibble’s strategic focus encompasses: B2C Social Platforms: Developing consumer-facing applications that integrate blockchain for enhanced data security, content ownership, and user engagement. B2B Business Solutions: Expanding on successes like AION to offer robust, scalable solutions for various industries, addressing critical business challenges with AI-driven insights and blockchain transparency. The synergy between AI and blockchain is powerful. AI provides the intelligence for automation and optimization, while blockchain offers immutable records, transparency, and enhanced security. Together, they create a formidable foundation for future digital ecosystems. As the digital transformation accelerates, platforms like the Pibble AI platform are poised to play a crucial role in shaping how businesses operate and interact globally. Their commitment to innovation and practical application demonstrates a clear path forward for enterprise-grade blockchain and AI solutions. In conclusion, the successful POC of Pibble’s AION with POSCO International marks a significant milestone in the adoption of AI and blockchain in enterprise solutions. By dramatically reducing task times and achieving exceptional accuracy, the Pibble AI platform has demonstrated its potential to redefine efficiency in global trade. This achievement not only validates Caramel Bay’s vision but also paves the way for a future where intelligent, secure, and highly efficient digital platforms drive business success. It’s an exciting glimpse into the future of B2B innovation. Frequently Asked Questions (FAQs) Q1: What is the Pibble AI platform AION? AION is an advanced AI trade solution developed by Caramel Bay, the company behind the Pibble blockchain project. It’s designed to automate and optimize complex trade processes, reducing manual effort and improving accuracy. Q2: What was the significance of the POC with POSCO International? The Proof-of-Concept (POC) with POSCO International demonstrated AION’s real-world effectiveness. It showed that the Pibble AI platform could reduce tasks from hours to minutes and achieve over 95% document accuracy in a demanding corporate environment, validating its capabilities. Q3: How does AION achieve such high accuracy and speed? AION leverages sophisticated artificial intelligence algorithms to process and verify trade documentation. This AI-driven approach allows for rapid analysis and identification of discrepancies, leading to significant time savings and a dramatic reduction in human error. Q4: What is Pibble’s broader vision beyond B2B solutions? Pibble is committed to integrating blockchain and AI across various platforms. While AION focuses on B2B solutions, Pibble also develops B2C social platforms, aiming to enhance user experience, data security, and content ownership through these advanced technologies. Q5: Why is the combination of AI and blockchain important for trade? AI provides the intelligence for automation and optimization, making processes faster and more accurate. Blockchain, on the other hand, offers immutable records, transparency, and enhanced security, ensuring that trade data is reliable and tamper-proof. Together, they create a powerful, trustworthy, and efficient trade ecosystem. If you found this insight into Pibble’s groundbreaking achievements inspiring, consider sharing this article with your network! Help us spread the word about how AI and blockchain are transforming global trade. Your shares on social media platforms like X (Twitter), LinkedIn, and Facebook can help more people discover the future of business solutions. To learn more about the latest crypto market trends, explore our article on key developments shaping AI in crypto institutional adoption. This post Pibble AI platform: Revolutionary AION Completes POSCO International POC with Stunning Success first appeared on BitcoinWorld.
Share
Coinstats2025/09/18 19:45