On the Frontier Math benchmark by EpochAI, o3 solved 25.2 p.c of issues, whereas no different mannequin has exceeded 2 p.c—suggesting a leap in mathematical reasoning capabilities over the earlier mannequin.
Benchmarks vs. real-world worth
Ideally, potential purposes for a real PhD-level AI mannequin would come with analyzing medical analysis information, supporting local weather modeling, and dealing with routine facets of analysis work.
The excessive value factors reported by The Info, if correct, counsel that OpenAI believes these techniques might present substantial worth to companies. The publication notes that SoftBank, an OpenAI investor, has dedicated to spending $3 billion on OpenAI’s agent merchandise this 12 months alone—indicating vital enterprise curiosity regardless of the prices.
In the meantime, OpenAI faces monetary pressures which will affect its premium pricing technique. The corporate reportedly misplaced roughly $5 billion final 12 months protecting operational prices and different bills associated to working its companies.
Information of OpenAI’s stratospheric pricing plans come after years of comparatively reasonably priced AI companies which have conditioned customers to anticipate highly effective capabilities at comparatively low prices. ChatGPT Plus stays $20 monthly and Claude Professional prices $30 month-to-month—each tiny fractions of those proposed enterprise tiers. Even ChatGPT Professional’s $200/month subscription is comparatively small in comparison with the brand new proposed charges. Whether or not the efficiency distinction between these tiers will match their thousandfold value distinction is an open query.
Regardless of their benchmark performances, these simulated reasoning fashions nonetheless battle with confabulations—situations the place they generate plausible-sounding however factually incorrect data. This stays a important concern for analysis purposes the place accuracy and reliability are paramount. A $20,000 month-to-month funding raises questions on whether or not organizations can belief these techniques to not introduce refined errors into high-stakes analysis.
In response to the information, a number of folks quipped on social media that firms might rent an precise PhD pupil for less expensive. “In case you may have forgotten,” wrote xAI developer Hieu Pham in a viral tweet, “most PhD college students, together with the brightest stars who can do manner higher work than any present LLMs—usually are not paid $20K / month.”
Whereas these techniques present sturdy capabilities on particular benchmarks, the “PhD-level” label stays largely a advertising time period. These fashions can course of and synthesize data at spectacular speeds, however questions stay about how successfully they’ll deal with the inventive pondering, mental skepticism, and unique analysis that outline precise doctoral-level work. Then again, they are going to by no means get drained or want medical health insurance, and they’ll probably proceed to enhance in functionality and drop in price over time.