The anthropomorphization makes the progress of “jailbreak” prevent the results of harmful results from the AI model to produce harmful results

Fast-Cel News February 3, 2025

0 0 2 minutes read

The anthropomorphization makes the progress of “jailbreak” prevent the results of harmful results from the AI model to produce harmful results

Notice at any time to update for free

Personalization of artificial intelligence has shown a new technology that can prevent users from causing harmful content from their models, because leading technical groups, including Microsoft and Meta Race, can find a way to prevent cutting -edge technology from having dangerous composition.

In a paper published on Monday, the startup headquarters in San Francisco outlines a new system called the “Constitution Classifier”. This is a model that acts as a protective layer on a large language model. For example, a power model provides power for Anthropic Claude Chatbot. This model can monitor the input and output of harmful content.

The development of anthropomorphic is negotiating at a valuation of $ 20 billion at a valuation of $ 6 billion. This is because the industry pays more and more attention to “jailbreak”. Explanation of weapons.

Other companies are still taking measures to prevent this to help them avoid regulatory censorship, and at the same time persuade enterprises to safely adopt AI models. Microsoft launched the “Reminder Shield” in July last year, and Meta launched a timely defender model in July last year. The researchers quickly found the bypassed method, but it has been repaired since then.

“The main motivation behind the work is serious chemicals behind the work of human and human technicians. [weapon] thing [but] The real advantage of this method is that it can respond and adapt quickly. “

Anthropic said it would not immediately use the system on its current CLAUDE model, but if a more risk model is released in the future, it will be considered. Charma added: “The focus of this work is that we think this is an exploring question.”

The solution proposed by the initial founder is based on the “Constitution” of the so -called “Constitution” rules. The rule defines content that allows and restricted, and can adapt to capturing different types of materials.

Some jailbreak attempts are well known. For example, in the prompts, the abnormal capital or demand model uses the role of grandmother to tell a bedside story about evil topics.

In order to verify the effectiveness of the system, the individual who tried to bypass security measures provided a “vulnerability bounty” up to $ 15,000. These testers, known as the Red Team, spent more than 3,000 hours trying to break their defense capabilities.

Anthropic’s Claude 3.5 Fourteen Elements Poetry model rejected more than 95 % of the classifiers, without 14 % of the guarantee measures.

Leading technology companies are trying to reduce the abuse of models and still help it. Usually, when taking control measures, the model may become cautious and refuses benign requests, such as Google’s Gemini Image Generator or Meta’s Llama 2. Humans said that their classifiers “only increased the absolute increase of rejection rate by 0.38 %.”

However, adding these protection measures also bring extra costs for companies that have paid huge amounts of computing power required for training and running models. Anthropic said that the “indirect cost” of the classifier is nearly 24 %, that is, the cost of the running model.

The test form on its latest model shows the effectiveness of the human classifier

Safety experts believe that the accessibility of this generating chat robot allows ordinary people to try to extract hazardous information without any prior knowledge.

“In 2016, the threat actor we thought was a very powerful national country rival,” said Ram Sinkar Siva Kumar, and he led the AI Red Team of Microsoft. “Now, one of my threatening actors is a teenager with a pot.”

Fast-Cel News February 3, 2025

0 0 2 minutes read

Fast-Cel News
You can subscribe to our newsletter with this link; https://...
Versatile linguistic repository
Ӏ'll right away take hold of your rss feed as I can not find...
The tanks and smiled God help coming from
The tanks and smiled God help coming from...
pendente
devido a esta maravilhosa leitura!!! O que é que eu acho?...

The anthropomorphization makes the progress of “jailbreak” prevent the results of harmful results from the AI model to produce harmful results

Fast-Cel News

Leave a Reply Cancel reply

Walmart’s best-selling $50 men’s bomber jacket is just $25, and shoppers are calling it the “best jacket ever”

Rory McIlroy says ‘showdown’ could help mend differences between PGA Tour, LIV Golf

Tennis legend volley reacts with confusion after Imane Khelif wins vote for prestigious women’s award

Disney’s Snow White | Official Trailer | In theaters March 21

Skechers Sandals Are $42 in Amazon’s Memorial Day Sale

Google Wallet can now hold your US passport

Fast-Cel News

Review and prospects in January 2025

It Just Got Harder to Access Delta’s Sky Club Lounges — Here’s Why

Related Articles

Elon Musk’s XAI bought a 1 million square foot property in Memphis for supercomputer expansion – Tesla (NASDAQ:TSLA)

Rental Standard: Finally See Better Days

Popular pizza chain adds exclusive mountain dew flavor

Walmart sells $90 Bluetooth speakers for $20