ars-rss
Revisting the Stanford Prison Experiment 50 years later
Ars chats with director Juliette Eisner and original study participants in new documentary series.
In 1971, Stanford University psychologist Philip Zimbardo conducted a notorious experiment in which he randomly divided college students into two groups, guards and prisoners, and set them loose in a simulated prison environment for six days, documenting the guards’ descent into brutality. His findings caused a media sensation and a lot of subsequent criticism about the ethics and methodology employed in the study. Zimbardo died last month at 91, but his controversial legacy continues to resonate some 50 years later with The Stanford Prison Experiment: Unlocking the Truth, a new documentary from National Geographic.
Director Juliette Eisner started working on the documentary during the pandemic when, like most people, she had a lot of extra time on her hands. She started looking at old psychological studies exploring human nature and became fascinated by the Stanford Prison Experiment, especially in light of the summer protests in 2020 concerning police brutality. She soon realized that the prevailing narrative was Zimbardo’s and that very few of the original subjects in the experiment had ever been interviewed about their experiences.
“I wanted to hear from those people,” Eisner told Ars. “They were very hard to find. Most of them were still only known by alias or by prisoner number.” Eisner persevered and tracked most of them down. “Every single time they picked up the phone, they were like, ‘Oh, I’m so glad you called. Nobody has called me in 50 years. And by the way, everything you think you know about this study is wrong,’ or ‘The story is not what it seems.'”
This elephant figured out how to use a hose to shower
A younger rival may have learned how to sabotage those showers by disrupting water flow.
An Asian elephant named Mary living at the Berlin Zoo surprised researchers by figuring out how to use a hose to take her morning showers, according to a new paper published in the journal Current Biology. “Elephants are amazing with hoses,” said co-author Michael Brecht of the Humboldt University of Berlin. “As it is often the case with elephants, hose tool use behaviors come out very differently from animal to animal; elephant Mary is the queen of showering.”
Tool use was once thought to be one of the defining features of humans, but examples of it were eventually observed in primates and other mammals. Dolphins have been observed using sea sponges to protect their beaks while foraging for food, and sea otters will break open shellfish like abalone with rocks. Several species of fish also use tools to hunt and crack open shellfish, as well as to clear a spot for nesting. And the coconut octopus collects coconut shells, stacking them and transporting them before reassembling them as shelter.
Birds have also been observed using tools in the wild, although this behavior was limited to corvids (crows, ravens, and jays), although woodpecker finches have been known to insert twigs into trees to impale passing larvae for food. Parrots, by contrast, have mostly been noted for their linguistic skills, and there has only been limited evidence that they use anything resembling a tool in the wild. Primarily, they seem to use external objects to position nuts while feeding.
New secret math benchmark stumps AI models and PhDs alike
FrontierMath’s difficult questions remain unpublished so that AI companies can’t train against it.
On Friday, research organization Epoch AI released FrontierMath, a new mathematics benchmark that has been turning heads in the AI world because it contains hundreds of expert-level problems that leading AI models solve less than 2 percent of the time, according to Epoch AI. The benchmark tests AI language models (such as GPT-4o, which powers ChatGPT) against original mathematics problems that typically require hours or days for specialist mathematicians to complete.
FrontierMath’s performance results, revealed in a preprint research paper, paint a stark picture of current AI model limitations. Even with access to Python environments for testing and verification, top models like Claude 3.5 Sonnet, GPT-4o, o1-preview, and Gemini 1.5 Pro scored extremely poorly. This contrasts with their high performance on simpler math benchmarks—many models now score above 90 percent on tests like GSM8K and MATH.
The design of FrontierMath differs from many existing AI benchmarks because the problem set remains private and unpublished to prevent data contamination. Many existing AI models are trained on other test problem datasets, allowing the AI models to easily solve the problems and appear more generally capable than they actually are. Many experts cite this as evidence that current large language models (LLMs) are poor generalist learners.
For the second time this year, NASA’s JPL center cuts its workforce
“If we hold strong together, we will come through this.”
Barely nine months after the last cut, NASA’s Jet Propulsion Laboratory will again reduce its workforce. On Wednesday, the lab will lay 325 employees off, representing about 5 percent of the workforce at the California-based laboratory that leads the development of robotic space probes for NASA.
“This is a message I had hoped not to have to write,” JPL Director Laurie Leshin said in a memo to staff members on Tuesday morning, local time. “Despite this being incredibly difficult for our community, this number is lower than projected a few months ago thanks in part to the hard work of so many people across JPL.”
The cuts this week follow a reduction of 530 employees in February of this year due to various factors, including a pause in funding for the Mars Sample Return mission. The NASA laboratory has now cut about one-eighth of its workforce this year.
What if AI doesn’t just keep getting better forever?
New reports highlight fears of diminishing returns for traditional LLM training.
For years now, many AI industry watchers have looked at the quickly growing capabilities of new AI models and mused about exponential performance increases continuing well into the future. Recently, though, some of that AI “scaling law” optimism has been replaced by fears that we may already be hitting a plateau in the capabilities of LLMs trained with standard methods.
A weekend report from The Information effectively summarized how these fears are manifesting amid a number of insiders at OpenAI. Unnamed OpenAI researchers told The Information that Orion, the company’s codename for its next full-fledged model release, is showing a smaller performance jump than the one seen between GPT-3 and GPT-4 in recent years. On certain tasks, in fact, the upcoming model “isn’t reliably better than its predecessor,” according to unnamed OpenAI researchers cited in the piece.
On Monday, OpenAI co-founder Ilya Sutskever, who left the company earlier this year, added to the concerns that LLMs were hitting a plateau in what can be gained from traditional pre-training. Sutskever told Reuters that “the 2010s were the age of scaling,” where throwing additional computing resources and training data at the same basic training methods could lead to impressive improvements in subsequent models.
Record labels unhappy with court win, say ISP should pay more for user piracy
Music companies appeal, demanding payment for each song instead of each album.
The big three record labels notched another court victory against a broadband provider last month, but the music publishing firms aren’t happy that an appeals court only awarded per-album damages instead of damages for each song.
Universal, Warner, and Sony are seeking an en banc rehearing of the copyright infringement case, claiming that Internet service provider Grande Communications should have to pay per-song damages over its failure to terminate the accounts of Internet users accused of piracy. The decision to make Grande pay for each album instead of each song “threatens copyright owners’ ability to obtain fair damages,” said the record labels’ petition filed last week.
The case is in the conservative-leaning US Court of Appeals for the 5th Circuit. A three-judge panel unanimously ruled last month that Grande, a subsidiary of Astound Broadband, violated the law by failing to terminate subscribers accused of being repeat infringers. Subscribers were flagged for infringement based on their IP addresses being connected to torrent downloads monitored by Rightscorp, a copyright-enforcement company used by the music labels.
Bitcoin hits record high as Trump vows to end crypto crackdown
Trump plans to shake up the SEC by installing pro-crypto leaders.
Bitcoin hit a new record high late Monday, its value peaking at $89,623 as investors quickly moved to cash in on expectations that Donald Trump will end a White House crackdown that intensified last year on crypto.
While the trading rally has now paused, analysts predict that bitcoin’s value will only continue rising following Trump’s win—perhaps even reaching $100,000 by the end of 2024, CNBC reported.
Bitcoin wasn’t the only winner emerging from the post-election crypto trading. Crypto exchanges like Coinbase also experienced surges in the market, and one of the biggest winners, CNBC reported, was dogecoin, a cryptocurrency linked to Elon Musk, who campaigned for Trump and may join his administration. Dogecoin’s value is up 135 percent since Trump’s win.
Calling all Ars readers! Your feedback is needed.
We want to hear from you.
Many of you know that most of our staff is spread out all over these United States, but what you might not know is that it has been more than five years since many of us saw each other in meatspace. Travel budgets and the pandemic conspired to keep us apart, but we are finally gathering Team Ars in New York City later this week. We’d love for you to be there, too, in spirit.
As we gear up for our big fall meeting, we want to hear from you! We’ve set up a special email address, Tellus@arstechnica.com, just for reader feedback. We won’t harvest your email for spam or some nonsense—we just want to hear from you.
What would we like to hear about? We’re eager to know your thoughts on what we’re doing right, where we could improve, and what you’d like to see more (or less) of. What topics do you think we should be covering that we aren’t? Are we hitting the right balance in our reporting? Is there too much doom and gloom, or not enough? Feel free to be as specific and loquacious as you wish.
Amazon ready to use its own AI chips, reduce its dependence on Nvidia
Annapurna Labs, acquired by Amazon in 2015, will release the Trainium 2 in December.
Amazon is poised to roll out its newest artificial intelligence chips as the Big Tech group seeks returns on its multibillion-dollar semiconductor investments and to reduce its reliance on market leader Nvidia.
Executives at Amazon’s cloud computing division are spending big on custom chips in the hopes of boosting the efficiency inside its dozens of data centers, ultimately bringing down its own costs as well as those of Amazon Web Services’ customers.
The effort is spearheaded by Annapurna Labs, an Austin-based chip start-up that Amazon acquired in early 2015 for $350 million. Annapurna’s latest work is expected to be showcased next month when Amazon announces widespread availability of ‘Trainium 2,’ part of a line of AI chips aimed at training the largest models.
Ars Live: Our first encounter with manipulative AI
On Nov. 19, join Benj Edwards and Simon Willison’s live YouTube chat about the “Great Bing Chat Fiasco of 2023.”
In the short-term, the most dangerous thing about AI language models may be their ability to emotionally manipulate humans if not carefully conditioned. The world saw its first taste of that danger in February 2023 with the launch of Bing Chat, now called Microsoft Copilot.
During its early testing period, the temperamental chatbot gave the world a preview of an “unhinged” version of OpenAI’s GPT-4 prior to its official release. Sydney’s sometimes uncensored and “emotional” nature (including use of emojis) arguably gave the world its first large-scale encounter with a truly manipulative AI system. The launch set off alarm bells in the AI alignment community and served as fuel for prominent warning letters about AI dangers.
On November 19 at 4 pm Eastern (1 pm Pacific), Ars Technica Senior AI Reporter Benj Edwards will host a livestream conversation on YouTube with independent AI researcher Simon Willison that will explore the impact and fallout of the 2023 fiasco. We’re calling it “Bing Chat: Our First Encounter with Manipulative AI.”