r/MachineLearning May 04 '24

Discussion [D] The "it" in AI models is really just the dataset?

Post image
1.3k Upvotes

r/MachineLearning Apr 23 '24

Discussion Meta does everything OpenAI should be [D]

987 Upvotes

I'm surprised (or maybe not) to say this, but Meta (or Facebook) democratises AI/ML much more than OpenAI, which was originally founded and primarily funded for this purpose. OpenAI has largely become a commercial project for profit only. Although as far as Llama models go, they don't yet reach GPT4 capabilities for me, but I believe it's only a matter of time. What do you guys think about this?


r/MachineLearning Apr 04 '24

Discussion [D] LLMs are harming AI research

879 Upvotes

This is a bold claim, but I feel like LLM hype dying down is long overdue. Not only there has been relatively little progress done to LLM performance and design improvements after GPT4: the primary way to make it better is still just to make it bigger and all alternative architectures to transformer proved to be subpar and inferior, they drive attention (and investment) away from other, potentially more impactful technologies. This is in combination with influx of people without any kind of knowledge of how even basic machine learning works, claiming to be "AI Researcher" because they used GPT for everyone to locally host a model, trying to convince you that "language models totally can reason. We just need another RAG solution!" whose sole goal of being in this community is not to develop new tech but to use existing in their desperate attempts to throw together a profitable service. Even the papers themselves are beginning to be largely written by LLMs. I can't help but think that the entire field might plateau simply because the ever growing community is content with mediocre fixes that at best make the model score slightly better on that arbitrary "score" they made up, ignoring the glaring issues like hallucinations, context length, inability of basic logic and sheer price of running models this size. I commend people who despite the market hype are working on agents capable of true logical process and hope there will be more attention brought to this soon.


r/MachineLearning Apr 22 '24

Discussion [D] Llama-3 may have just killed proprietary AI models

697 Upvotes

Full Blog Post

Meta released Llama-3 only three days ago, and it already feels like the inflection point when open source models finally closed the gap with proprietary models. The initial benchmarks show that Llama-3 70B comes pretty close to GPT-4 in many tasks:

The even more powerful Llama-3 400B+ model is still in training and is likely to surpass GPT-4 and Opus once released.

Meta vs OpenAI

Some speculate that Meta's goal from the start was to target OpenAI with a "scorched earth" approach by releasing powerful open models to disrupt the competitive landscape and avoid being left behind in the AI race.

Meta can likely outspend OpenAI on compute and talent:

  • OpenAI makes an estimated revenue of $2B and is likely unprofitable. Meta generated a revenue of $134B and profits of $39B in 2023.
  • Meta's compute resources likely outrank OpenAI by now.
  • Open source likely attracts better talent and researchers.

One possible outcome could be the acquisition of OpenAI by Microsoft to catch up with Meta. Google is also making moves into the open model space and has similar capabilities to Meta. It will be interesting to see where they fit in.

The Winners: Developers and AI Product Startups

I recently wrote about the excitement of building an AI startup right now, as your product automatically improves with each major model advancement. With the release of Llama-3, the opportunities for developers are even greater:

  • No more vendor lock-in.
  • Instead of just wrapping proprietary API endpoints, developers can now integrate AI deeply into their products in a very cost-effective and performant way. There are already over 800 llama-3 models variations on Hugging Face, and it looks like everyone will be able to fine-tune for their us-cases, languages, or industry.
  • Faster, cheaper hardware: Groq can now generate 800 llama-3 tokens per second at a small fraction of the GPT costs. Near-instant LLM responses at low prices are on the horizon.

Open source multimodal models for vision and video still have to catch up, but I expect this to happen very soon.

The release of Llama-3 marks a significant milestone in the democratization of AI, but it's probably too early to declare the death of proprietary models. Who knows, maybe GPT-5 will surprise us all and surpass our imaginations of what transformer models can do.

These are definitely super exciting times to build in the AI space!


r/MachineLearning Apr 13 '24

Discussion [D] Folks here have no idea how competitive top PhD program admissions are these days, wow...

638 Upvotes

I'm a CS PhD student, and I see the profiles of everyone admitted to our school (and similar top schools) these days since I'm right in the center of everything (and have been for years).

I'm reading the comments on the other thread and honestly shocked. So many ppl believe the post is fake and I see comments saying things like "you don't even need top conference papers to get into top PhD programs" (this is incorrect). I feel like many folks here are not up-to-date with just how competitive admissions are to top PhD programs these days...

In fact I'm not surprised. The top programs look at much more than simply publications. Incredibly strong LOR from famous/respected professors and personal connections to the faculty you want to work with are MUCH more important. Based on what they said (how they worked on the papers by themselves and don't have good recs), they have neither of these two most important things...

FYI most of the PhD admits in my year had 7+ top conference papers (some with best paper awards), hundreds of citations, tons of research exp, masters at top schools like CMU or UW or industry/AI residency experience at top companies like Google or OpenAI, rec letters from famous researchers in the world, personal connections, research awards, talks for top companies or at big events/conferences, etc... These top programs are choosing the top students to admit from the entire world.

The folks in the comments have no idea how competitive NLP is (which I assume is the original OP's area since they mentioned EMNLP). Keep in mind this was before the ChatGPT boom too, so things now are probably even more competitive...

Also pasting a comment I wrote on a similar thread months back:

"PhD admissions are incredibly competitive, especially at top schools. Most admits to top ML PhD programs these days have multiple publications, numerous citations, incredibly strong LoR from respected researchers/faculty, personal connections to the faculty they want to work with, other research-related activities and achievements/awards, on top of a good GPA and typically coming from a top school already for undergrad/masters.

Don't want to scare/discourage you but just being completely honest and transparent. It gets worse each year too (competition rises exponentially), and I'm usually encouraging folks who are just getting into ML research (with hopes/goals of pursuing a PhD) with no existing experience and publications to maybe think twice about it or consider other options tbh.

It does vary by subfield though. For example, areas like NLP and vision are incredibly competitive, but machine learning theory is relatively less so."

Edit1: FYI I don't agree with this either. It's insanely unhealthy and overly competitive. However there's no choice when the entire world is working so hard in this field and there's so many ppl in it... These top programs admit the best people due to limited spots, and they can't just reject better people for others.

Edit2: some folks saying u don't need so many papers/accomplishments to get in. That's true if you have personal connections or incredibly strong letters from folks that know the target faculty well. In most cases this is not the case, so you need more pubs to boost your profile. Honestly these days, you usually need both (connections/strong letters plus papers/accomplishments).

Edit3: for folks asking about quality over quantity, I'd say quantity helps you get through the earlier admission stages (as there are way too many applicants so they have to use "easy/quantifiable metrics" to filter like number of papers - unless you have things like connections or strong letters from well-known researchers), but later on it's mainly quality and research fit, as individual faculty will review profiles of students (and even read some of their papers in-depth) and conduct 1-on-1 interviews. So quantity is one thing that helps get you to the later stages, but quality (not just of your papers, but things like rec letters and your actual experience/potential) matters much more for the final admission decision.

Edit4: like I said, this is field/area dependent. CS as a whole is competitive, but ML/AI is another level. Then within ML/AI, areas like NLP and Vision are ridiculous. It also depends what schools and labs/profs you are targeting, research fit, connections, etc. Not a one size fits all. But my overall message is that things are just crazy competitive these days as a whole, although there will be exceptions.

Edit5: not meant to be discouraging as much as honest and transparent so folks know what to expect and won't be as devastated with results, and also apply smarter (e.g. to more schools/labs including lower-ranked ones and to industry positions). Better to keep more options open in such a competitive field during these times...

Edit6: IMO most important things for top ML PhD admissions: connections and research fit with the prof >= rec letters (preferably from top researchers or folks the target faculty know well) > publications (quality) > publications (quantity) >= your overall research experiences and accomplishments > SOP (as long as overall research fit, rec letters, and profile are strong, this is less important imo as long as it's not written poorly) >>> GPA (as long as it's decent and can make the normally generous cutoff you'll be fine) >> GRE/whatever test scores (normally also cutoff based and I think most PhD programs don't require them anymore since Covid)


r/MachineLearning Mar 31 '24

News WSJ: The AI industry spent 17x more on Nvidia chips than it brought in in revenue [N]

633 Upvotes

... In a presentation earlier this month, the venture-capital firm Sequoia estimated that the AI industry spent $50 billion on the Nvidia chips used to train advanced AI models last year, but brought in only $3 billion in revenue.

Source: WSJ (paywalled)


r/MachineLearning Mar 25 '24

Discussion [D] Your salary is determined mainly by geography, not your skill level (conclusions from the salary model built with 24k samples and 300 questions)

588 Upvotes

I have built a model that predicts the salary of Data Scientists / Machine Learning Engineers based on 23,997 responses and 294 questions from a 2022 Kaggle Machine Learning & Data Science Survey (Source: https://jobs-in-data.com/salary/data-scientist-salary)

I have studied the feature importances from the LGBM model.

TL;DR: Country of residence is an order of magnitude more important than anything else (including your experience, job title or the industry you work in). So - if you want to follow the famous "work smart not hard" - the key question seems to be how to optimize the geography aspect of your career above all else.

The model was built for data professions, but IMO it applies also to other professions as well.


r/MachineLearning Apr 02 '24

Discussion [D] LLMs causing more harm than good for the field?

461 Upvotes

This post might be a bit ranty, but i feel more and more share this sentiment with me as of late. If you bother to read this whole post feel free to share how you feel about this.

When OpenAI put the knowledge of AI in the everyday household, I was at first optimistic about it. In smaller countries outside the US, companies were very hesitant before about AI, they thought it felt far away and something only big FANG companies were able to do. Now? Its much better. Everyone is interested in it and wants to know how they can use AI in their business. Which is great!

Pre-ChatGPT-times, when people asked me what i worked with and i responded "Machine Learning/AI" they had no clue and pretty much no further interest (Unless they were a tech-person)

Post-ChatGPT-times, when I get asked the same questions I get "Oh, you do that thing with the chatbots?"

Its a step in the right direction, I guess. I don't really have that much interest in LLMs and have the privilege to work exclusively on vision related tasks unlike some other people who have had to pivot to working full time with LLMs.

However, right now I think its almost doing more harm to the field than good. Let me share some of my observations, but before that I want to highlight I'm in no way trying to gatekeep the field of AI in any way.

I've gotten job offers to be "ChatGPT expert", What does that even mean? I strongly believe that jobs like these don't really fill a real function and is more of a "hypetrain"-job than a job that fills any function at all.

Over the past years I've been going to some conferences around Europe, one being last week, which has usually been great with good technological depth and a place for Data-scientists/ML Engineers to network, share ideas and collaborate. However, now the talks, the depth, the networking has all changed drastically. No longer is it new and exiting ways companies are using AI to do cool things and push the envelope, its all GANs and LLMs with surface level knowledge. The few "old-school" type talks being sent off to a 2nd track in a small room
The panel discussions are filled with philosophists with no fundamental knowledge of AI talking about if LLMs will become sentient or not. The spaces for data-scientists/ML engineers are quickly dissapearing outside the academic conferences, being pushed out by the current hypetrain.
The hypetrain evangelists also promise miracles and gold with LLMs and GANs, miracles that they will never live up to. When the investors realize that the LLMs cant live up to these miracles they will instantly get more hesitant with funding for future projects within AI, sending us back into an AI-winter once again.

EDIT: P.S. I've also seen more people on this reddit appearing claiming to be "Generative AI experts". But when delving deeper it turns out they are just "good prompters" and have no real knowledge, expertice or interest in the actual field of AI or Generative AI.


r/MachineLearning Apr 16 '24

Stanford releases their rather comprehensive (500 page) "2004 AI Index Report summarizing the state of AI today.

Thumbnail aiindex.stanford.edu
457 Upvotes

r/MachineLearning May 03 '24

News [N] AI engineers report burnout and rushed rollouts as ‘rat race’ to stay competitive hits tech industry

439 Upvotes

AI engineers report burnout and rushed rollouts as ‘rat race’ to stay competitive hits tech industry

Summary from article:

  • Artificial intelligence engineers at top tech companies told CNBC that the pressure to roll out AI tools at breakneck speed has come to define their jobs.

  • They say that much of their work is assigned to appease investors rather than to solve problems for end users, and that they are often chasing OpenAI.

  • Burnout is an increasingly common theme as AI workers say their employers are pursuing projects without regard for the technology’s effect on climate change, surveillance and other potential real-world harms.

An especially poignant quote from the article:

An AI engineer who works at a retail surveillance startup told CNBC that he’s the only AI engineer at a company of 40 people and that he handles any responsibility related to AI, which is an overwhelming task. He said the company’s investors have inaccurate views on the capabilities of AI, often asking him to build certain things that are “impossible for me to deliver.”


r/MachineLearning Apr 18 '24

News [N] Meta releases Llama 3

401 Upvotes

r/MachineLearning May 19 '24

Discussion [D] How did OpenAI go from doing exciting research to a big-tech-like company?

399 Upvotes

I was recently revisiting OpenAI’s paper on DOTA2 Open Five, and it’s so impressive what they did there from both engineering and research standpoint. Creating a distributed system of 50k CPUs for the rollout, 1k GPUs for training while taking between 8k and 80k actions from 16k observations per 0.25s—how crazy is that?? They also were doing “surgeries” on the RL model to recover weights as their reward function, observation space, and even architecture has changed over the couple months of training. Last but not least, they beat the OG team (world champions at the time) and deployed the agent to play live with other players online.

Fast forward a couple of years, they are predicting the next token in a sequence. Don’t get me wrong, the capabilities of gpt4 and its omni version are truly amazing feat of engineering and research (probably much more useful), but they don’t seem to be as interesting (from the research perspective) as some of their previous work.

So, now I am wondering how did the engineers and researchers transition throughout the years? Was it mostly due to their financial situation and need to become profitable or is there a deeper reason for their transition?


r/MachineLearning Mar 30 '24

[N] How Stability AI’s Founder Tanked His Billion-Dollar Startup

394 Upvotes

forbes article: https://www.forbes.com/sites/kenrickcai/2024/03/29/how-stability-ais-founder-tanked-his-billion-dollar-startup/

archive no paywall: https://archive.is/snbeV

How Stability AI’s Founder Tanked His Billion-Dollar Startup

Mar 29, 2024

Stability AI founder Emad Mostaque took the stage last week at the Terranea Resort in Palos Verdes, California to roaring applause and an introduction from an AI-generated Aristotle who announced him as “a modern Prometheus” with “the astuteness of Athena and the vision of Daedalus.”

“Under his stewardship, AI becomes the Herculean force poised to vanquish the twin serpents of illness and ailment and extend the olive branch of longevity,” the faux Aristotle proclaimed.

“I think that’s the best intro I’ve ever had,” Mostaque said.

But behind Mostaque's hagiographic introduction lay a grim and fast metastasizing truth. Stability, once one of AI’s buzziest startups, was floundering. It had been running out of money for months and Mostaque had been unable to secure enough additional funding. It had defaulted on payments to Amazon whose cloud service undergirded Stability’s core offerings. The star research team behind its flagship text-to-image generator Stable Diffusion had tendered their resignations just three days before — as Forbes would first report — and other senior leaders had issued him an ultimatum: resign, or we walk too.

Still, onstage before a massive audience of peers and acolytes, Mostaque talked a big game. “AI is jet planes for the mind,” he opined. “AI is our collective intelligence. It's the human Colossus.” He claimed a new, faster version of the Stable Diffusion image generator released earlier this month could generate “200 cats with hats per second.” But later, when he was asked about Stability’s financial model, Mostaque fumbled. “I can’t say that publicly,” he replied. “But it’s going well. We’re ahead of forecast.”

Four days later, Mostaque stepped down as CEO of Stability, as Forbes first reported. In a post to X, the service formerly known as Twitter, he claimed he’d voluntarily abdicated his role to decentralize “the concentration of power in AI.” But sources told Forbes that was hardly the case. Behind the scenes, Mostaque had fought to maintain his position and control despite mounting pressure externally and internally to step down. Company documents and interviews with 32 current and former employees, investors, collaborators and industry observers suggest his abrupt exit was the result of poor business judgment and wild overspending that undermined confidence in his vision and leadership, and ultimately kneecapped the company.

Mostaque, through his attorneys, declined to comment on record on a detailed list of questions about the reporting in this story. But in an email to Forbes earlier this week he broadly disputed the allegations. “Nobody tells you how hard it is to be a CEO and there are better CEOs than me to scale a business,” he said in a statement. “I am not sure anyone else would have been able to build and grow the research team to build the best and most widely used models out there and I’m very proud of the team there. I look forward to moving onto the next problem to handle and hopefully move the needle.”

In an emailed statement, Christian Laforte and Shan Shan Wong, the interim co-CEOs who replaced Mostaque, said, "the company remains focused on commercializing its world leading technology” and providing it “to partners across the creative industries."

After starting Stability in 2019, Mostaque built the company into an early AI juggernaut by seizing upon a promising research project that would become Stable Diffusion and funding it into a business reality. The ease with which the software generated detailed images from the simplest text prompts immediately captivated the public: 10 million people used it on any given day, the company told Forbes in early 2023. For some true believers, Mostaque was a crucial advocate for open-source AI development in a space dominated by the closed systems of OpenAI, Google and Anthropic.

But his startup’s rise to one of the buzziest in generative AI was in part built on a series of exaggerations and misleading claims, as Forbes first reported last year (Mostaque disputed some points at the time). And they continued after he raised $100 million at a $1 billion valuation just days after launching Stable Diffusion in 2022. His failure to deliver on an array of grand promises, like building bespoke AI models for nation states, and his decision to pour tens of millions into research without a sustainable business plan, eroded Stability’s foundations and jeopardized its future.

"He was just giving shit away,” one former employee told Forbes. “That man legitimately wanted to transform the world. He actually wanted to train AI models for kids in Malawi. Was it practical? Absolutely not."

By October 2023, Stability would have less than $4 million left in the bank, according to an internal memo prepared for a board meeting and reviewed by Forbes. And mounting debt, including months of overdue Amazon Web Services payments, had already left it in the red. To avoid legal penalties for skipping Americans staff’s payroll, the document explained, the London-based startup was considering delaying tax payments to the U.K. government.

It was Stability’s armada of GPUs, the wildly powerful and equally expensive chips undergirding AI, that were so taxing the company’s finances. Hosted by AWS, they had long been one of Mostaque’s bragging points; he often touted them as one of the world’s 10 largest supercomputers. They were responsible for helping Stability’s researchers build and maintain one of the top AI image generators, as well as break important new ground on generative audio, video and 3D models. “Undeniably, Stability has continued to ship a lot of models,” said one former employee. “They may not have profited off of it, but the broader ecosystem benefitted in a huge, huge way.”

But the costs associated with so much compute were now threatening to sink the company. According to an internal October financial forecast seen by Forbes, Stability was on track to spend $99 million on compute in 2023. It noted as well that Stability was “underpaying AWS bills for July (by $1M)” and “not planning to pay AWS at the end of October for August usage ($7M).” Then there were the September and October bills, plus $1 million owed to Google Cloud and $600,000 to GPU cloud data center CoreWeave. (Amazon, Google and CoreWeave declined to comment.)

With an additional $54 million allocated to wages and operating expenses, Stability’s total projected costs for 2023 were $153 million. But according to its October financial report, its projected revenue for the calendar year was just $11 million. Stability was on track to lose more money per month than it made in an entire year.

The company’s dire financial position had thoroughly soured Stability’s current investors, including Coatue, which had invested tens of millions in the company during its $101 million funding round in 2022. In the middle of 2023, Mostaque agreed to an independent audit after Coatue raised a series of concerns, according to a source with direct knowledge of the matter. The outcome of the investigation is unclear. Coatue declined to comment.

Within a week of an early October board meeting where Mostaque shared that financial forecast, Lightspeed Venture Partners, another major investor, sent a letter to the board urging them to sell the company. The distressing numbers had “severely undermined” the firm’s confidence in Mostaque’s ability to lead the company.

“In particular, we are surprised and deeply concerned by a cash position just now disclosed to us that is inconsistent with prior discussions on this topic,” Lightspeed’s general counsel Brett Nissenberg wrote in the letter, a copy of which was viewed by Forbes. “Lightspeed believes that the company is not likely financeable on terms that would assure the company’s long term sound financial position.” (Lightspeed declined a request for comment.)

The calls for a sale led Stability to quietly begin looking for a buyer. Bloomberg reported in November that Stability approached AI startups Cohere and Jasper to gauge their interest. Stability denied this, and Jasper CEO Timothy Young did the same when reached for comment by Forbes. A Cohere representative declined to comment.

But one prominent AI company confirmed that Mostaque’s representatives had reached out to them to test the waters. Those talks did not advance because “the numbers didn’t add up,” this person, who declined to be named due to the confidential nature of the talks, told Forbes. Stability also tried to court Samsung as a buyer, going so far as to redecorate its office in advance of a planned meeting with the Korean electronics giant. (Samsung said that it invested in Stability in 2023 and that it does not comment on M&A discussions.)

Coatue had been calling for Mostaque’s resignation for months, according to a source with direct knowledge. But it and other investors were unable to oust him because he was the company’s majority shareholder. When they tried a different tact by rallying other investors to offer him a juicy equity package to resign, Mostaque refused, said two sources. By October, Coatue and Lightspeed had had enough. Coatue left the board and Lightspeed resigned its observer seat.

“Emad infuriated our initial investors so much it’s just making it impossible for us to raise more money under acceptable terms,” one current Stability executive told Forbes.

The early months of 2024 saw Stability’s already precarious position eroding further still. Employees were quietly laid off. Three people in a position to know estimated that at least 10% of staff were cut. And cash reserves continued to dwindle. Mostaque mentioned a lifeline at the October board meeting: $95 million in tentative funding from new investors, pending due diligence. But in the end, only a fraction of it was wired, two sources say, much of it from Intel, which Forbes has learned invested $20 million, a fraction of what was reported. (Intel did not return a request for comment by publication time.)

Two hours after Forbes broke the news of Mostaque’s plans to step down as CEO, Stability issued a press release confirming his resignation. Chief operating officer Wong and chief technology officer Laforte have taken over in the interim. Mostaque, who said on X that he still owns a majority of the company, also stepped down from the board, which has now initiated a search for a permanent CEO. There is a lot of work to be done to turn things around, and very little time in which to do it. Said the current Stability executive, “There’s still a possibility of a turnaround story, but the odds drop by the day.”

In July of 2023, Mostaque still thought he could pull it off. Halfway through the month, he shared a fundraising plan with his lieutenants. It was wildly optimistic, detailing the raise of $500 million in cash and another $750 million in computing facilities from marquee investors like Nvidia, Google, Intel and the World Bank (Nvidia and Google declined comment. Intel did not respond. The World Bank said it did not invest in Stability). In a Slack message reviewed by Forbes, Mostaque said Google was “willing to move fast” and the round was “likely to be oversubscribed.”

It wasn’t. Three people with direct knowledge of these fundraising efforts told Forbes that while there was some interest in Stability, talks often stalled when it came time to disclose financials. Two of them noted that earlier in the year, Mostaque had simply stopped engaging with VCs who asked for numbers. Only one firm invested around that time: actor Ashton Kutcher’s Sound Ventures, which invested $35 million in the form of a convertible SAFE note during the second quarter, according to an internal document. (Sound Ventures did not respond to a request for comment.)

And though he’d managed to score a meeting with Nvidia and its CEO Jensen Huang, it ended in disaster, according to two sources. “Under Jensen's microscopic questions, Emad just fell apart,” a source in position to know told Forbes. Huang quickly concluded Stability wasn’t ready for an investment from Nvidia, the sources said. Mostaque told Forbes in an email that he had not met with Huang since 2022, except to say “hello and what’s up a few times after.” His July 2023 message references a plan to raise $150 million from Nvidia. (Nvidia declined to comment.)

After a June Forbes investigation citing more than 30 sources revealed Mostaque’s history of misleading claims, Mostaque struggled to raise funding, a Stability investor told Forbes. (Mostaque disputed the story at the time and called it "coordinated lies" in his email this week to Forbes). Increasingly, investors scrutinized his assertions and pressed for data. And Young, now the CEO of Jasper, turned down a verbal offer to be Stability’s president after reading the article, according to a source with direct knowledge of the matter. The collapse of the talks aggravated the board and other executives, who had hoped Young would compensate for the sales and business management skills that Mostaque lacked, according to four people in a position to know. (Young declined to comment.)

When Stability’s senior leadership convened in London for the CogX conference in September, the financing had still not closed. There, a group of executives confronted Mostaque asking questions about the company’s cash position and runway, according to three people with direct knowledge of the incident. They did not get the clarity they’d hoped for.

By October, Mostaque had reduced his fundraising target by more than 80%.

The months that followed saw a steady drumbeat of departures — general counsel Adam Avrunin, vice presidents Mike Melnicki, Ed Newton-Rex and Joe Penna, chief people officer Ozden Onder — culminating in the demoralizing March exit of Stable Diffusion’s primary developers Robin Rombach, Andreas Blattmann, Patrick Esser and Dominik Lorenz. Rombach, who led the team, had been angling to leave for months, two sources said, first threatening to resign last summer because of the fundraising failures. Others left over concerns about cash flow, as well as liabilities — including what four people described as Mostaque’s lax approach to ensuring that Stability products could not be used to produce child sexual abuse imagery.

“Stability AI is committed to preventing the misuse of AI and prohibits the use of our image models and services for unlawful activity, including attempts to edit or create CSAM,” Ella Irwin, senior vice president of integrity, said in a statement.

Newton-Rex told Forbes he resigned because he disagreed with Stability’s position that training AI on copyrighted work without consent is fair use. Melnicki and Penna declined to comment. Avrunin and Onder could not be reached for comment. None of the researchers responded to requests for comment.

The Stable Diffusion researchers’ departure as a cohort says a lot about the state of Stability AI. The company’s researchers were widely viewed as its crown jewels, their work subsidized with a firehose of pricey compute power that was even extended to people outside the company. Martino Russi, an artificial intelligence researcher, told Forbes that though he was never formally employed by Stability, the company provided him a “staggering” amount of compute between January and April 2023 to play around with developing an AI video generator that Stability might someday use. “It was Candy Land or Coney Island,” said Russi, who estimates that his experiment, which was ultimately shelved, cost the company $2.5 million.

Stable Diffusion was simultaneously Stability’s marquee product and its existential cash crisis. One current employee described it to Forbes as “a giant vacuum that absorbed everything: money, compute, people.” While the software was widely used, with Mostaque claiming downloads reaching into the hundreds of millions, Stability struggled to translate that wild success into revenue. Mostaque knew it could be done — peers at Databricks, Elastic and MongoDB had all turned a free product into a lucrative business — he just couldn’t figure out how.

His first attempt was Stability’s API, which allowed paying customers to integrate Stable Diffusion into their own products. In early 2023, a handful of small companies, like art generator app NightCafe and presentation software startup Tome, signed on, according to four people with knowledge of the deals. But Stability’s poor account management services soured many, and in a matter of months NightCafe and Tome canceled their contracts, three people said. NightCafe founder Angus Russell told Forbes that his company switched to a competitor which “offered much cheaper inference costs and a broader service.” Tome did not respond to a request for comment.

Meanwhile, Mostaque’s efforts to court larger companies like Samsung and Snapchat were failing, according to five people familiar with the effort. Canva, which was already one of the heaviest users of open-sourced Stable Diffusion, had multiple discussions with Stability, which was angling for a contract it hoped would generate several millions in annual revenue. But the deal never materialized, four sources said.

“These three companies wanted and needed us,” one former employee told Forbes. “They would have been the perfect customers.” (Samsung, Snap and Canva declined to comment.)

“It’s not that there was not an appetite to pay Stability — there were tons of companies that would have that wanted to,” the former employee said. “There was a huge opportunity and demand, but just a resistance to execution.”

Mostaque’s other big idea was to provide governments with bespoke national AI models that would invigorate their economies and citizenry. “Emad envisions a world where AI through 100 national models serves not as a tool of the few, but as a benefactor to all promising to confront great adversaries, cancer, autism, and the sands of time itself,” the AI avatar of Aristotle said in his intro at the conference.

Mostaque told several prospective customers that he could deliver such models within 60 days — an untenable timeline, according to two people in position to know. Stability attempted to develop a model for the Singaporean government over the protestation of employees who questioned its technical feasibility, three sources familiar with the effort told Forbes. But it couldn’t pull it off and Singapore never became a customer. (The government of Singapore confirmed it did not enter into a deal with Stability, but declined to answer additional questions.)

As Stability careened from one new business idea to another, resources were abruptly reallocated and researchers reassigned. The whiplash shifts in a largely siloed organization demoralized and infuriated employees. “There were ‘urgent’ things, ‘urgent urgent’ things and ‘most urgent,’” one former employee complained. “None of these things seem important if everything is important.”

Another former Stability executive was far more pointed in their assessment. “Emad is the most disorganized leader I have ever worked with in my career,” this person told Forbes. “He has no vision, and changes directions every week, often based on what he sees on Twitter.”

In a video interview posted shortly before this story was published, Mostaque explained his leadership style: “I'm particularly great at taking creatives, developers, researchers, others, and achieving their full potential in designing systems. But I should not be dealing with, you know, HR and operations and business development and other elements. There are far better people than me to do that.”

By December 2023, Stability had partially abandoned its open-source roots and announced that any commercial use of Stable Diffusion would cost customers at least $20 per month (non-commercial and research use of Stable Diffusion would remain free).

But privately, Stability was considering a potentially more lucrative source of revenue: reselling the compute it was leasing from providers like AWS, according to six people familiar with the effort. Though it was essentially GPU arbitrage, Stability framed the strategy to investors as a “managed services” offering. Its damning October financial report projected optimistically that such an offering would bring in $139 million in 2024 — 98% of its revenue. Multiple employees at the time told Forbes they feared reselling compute, even if the company called it “managed services,” would violate the terms of Stability’s contract with AWS. Amazon declined to comment. “The line internally was that we are not reselling compute,” one former employee said. “This was some of the dirtiest feeling stuff.”

Stability also discussed reselling a cluster of Nvidia A100 chips, leased via CoreWeave, to the venture capital firm Andreessen Horowitz, three sources said. “It was under the guise of managed services, but there wasn’t any management happening,” one of these people told Forbes. Andreessen Horowitz and CoreWeave declined to comment.

Stability did not respond to questions about if it plans to continue this strategy now that Mostaque is out of the picture. Regardless, interim co-CEOs Wong and Laforte are on a tight timeline to clean up his mess. Board chairman Jim O’Shaughnessy said in a statement that he was confident the pair “will adeptly steer the company forward in developing and commercializing industry-leading generative AI products.” But burn continues to far outpace revenue. The Financial Times reported Friday that the company made $5.4 million of revenue in February, against $8 million in costs. Several sources said there are ongoing concerns about making payroll for the roughly 150 remaining employees. Leadership roles have gone vacant for months amid the disarray, leaving the company increasingly directionless.

Meanwhile, a potentially catastrophic legal threat looms over the company: A trio of copyright infringement lawsuits brought by Getty Images and a group of artists in the U.S. and U.K., who claim Stability illegally used their art and photography to train the AI models powering Stable Diffusion. A London-based court has already rejected the company’s bid to throw out one of the lawsuits on the basis that none of its researchers were based in the U.K. And Stability’s claim that Getty’s Delaware lawsuit should be blocked because it's a U.K.-based company was rejected. (Stability did not respond to questions about the litigation.)

AI-related copyright litigation “could go on for years,” according to Eric Goldman, a law professor at Santa Clara University. He told Forbes that though plaintiffs suing AI firms face an uphill battle overcoming the existing legal precedent on copyright infringement, the quantity of arguments available to make are virtually inexhaustible. “Like in military theory, if there’s a gap in your lines, that’s where the enemy pours through — if any one of those arguments succeeds, it could completely change the generative AI environment,” he said. “In some sense, generative AI as an industry has to win everything.”

Stability, which had more than $100 million in the bank just a year and a half ago, is in a deep hole. Not only does it need more funding, it needs a viable business model — or a buyer with the vision and chops to make it successful in a fast-moving and highly competitive sector.

At an all hands meeting this past Monday, Stability’s new leaders detailed a path forward. One point of emphasis: a plan to better manage resources and expenses, according to one person in attendance. It’s a start, but Mostaque’s meddling has left them with little runway to execute. His resignation, though, has given some employees hope. “A few people are 100% going to reconsider leaving after today,” said one current employee. “And the weird gloomy aura of hearing Emad talking nonsense for an hour is gone.”

Shortly before Mostaque resigned, one current Stability executive told Forbes that they were optimistic his departure could make Stability appealing enough to receive a small investment or sale to a friendly party.

“There are companies that have raised hundreds of millions of dollars that have much less intrinsic value than Stability,” the person said. “A white knight may still appear.”


r/MachineLearning May 01 '24

Research [R] KAN: Kolmogorov-Arnold Networks

379 Upvotes

Paper: https://arxiv.org/abs/2404.19756

Code: https://github.com/KindXiaoming/pykan

Quick intro: https://kindxiaoming.github.io/pykan/intro.html

Documentation: https://kindxiaoming.github.io/pykan/

Abstract:

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights"). KANs have no linear weights at all -- every weight parameter is replaced by a univariate function parametrized as a spline. We show that this seemingly simple change makes KANs outperform MLPs in terms of accuracy and interpretability. For accuracy, much smaller KANs can achieve comparable or better accuracy than much larger MLPs in data fitting and PDE solving. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users. Through two examples in mathematics and physics, KANs are shown to be useful collaborators helping scientists (re)discover mathematical and physical laws. In summary, KANs are promising alternatives for MLPs, opening opportunities for further improving today's deep learning models which rely heavily on MLPs.


r/MachineLearning Apr 28 '24

Discussion You need everything other than ML to win a ML hackathon [D]

368 Upvotes

Basically a rant on condition of offline hackathons hosted my big MNCs and institues.

Tired of participating in hackathons aimed to "develope cutting edge solution" and end up losing to a guy who have never studied machine learning but expert in "bussiness informatics" and really good while pitching the solution within given time limit.

How can a sane mind who worked on idea, a prototype and a model for 2-3 days non-stop only gets to talk about it just for 3-5 minutes? I've literally seen people cloning github repos somewhat related to the problem statement and sell it like a some kind of state of the art product. I agree that this skills is more important in industry but then why name those hackathons as "Machine Learning" or "AI" hackathons? Better name it "sell me some trash".

Only option for someone really into developing a good product, a working model within limited time constraints and someone who loves competing (like me) is to participate online or in "data" competition.


r/MachineLearning Mar 25 '24

Research [R] Up to 17% of Recent AI Conference Peer Reviews Written by ChatGPT

362 Upvotes

A new study has uncovered that a significant fraction of peer reviews for top AI conferences in 2023-2024 likely included substantial AI-generated content from models like ChatGPT.

Using a novel statistical technique, researchers estimated the percentage of text generated by AI in large collections of documents. Analyzing peer reviews, they found:

  • 10.6% of ICLR 2024 reviews had significant AI content
  • 9.1% for NeurIPS 2023
  • 6.5% for CoRL 2023
  • 16.9% for EMNLP 2023

In contrast, only 1-2% of pre-ChatGPT reviews from 2022 and earlier were flagged as having substantial AI contribution.

Some key findings:

  1. AI-heavy reviews tended to come in close to the deadline
  2. Fewer scholarly citations in AI-flavored reviews
  3. Reviewers with AI-tinged reviews engaged less in author discussion
  4. AI content made reviews more semantically homogeneous
  5. Lower reviewer confidence correlated with higher AI estimates

The study, I think, raises some questions for proactive policy development in academia around responsible AI use in research. AI may be eroding the quality and integrity of peer review through these "shadow" influences. Open questions include:

  • Should AI assistance in peer review be disclosed?
  • How should we incentivize good practices despite AI temptations?
  • Can we preserve intellectual diversity under AI homogenization?
  • Should we rethink credit for hybrid human/AI knowledge work?

Overall, an interesting empirical glimpse into AI's rapidly growing tendrils in the foundations of scientific quality control! I thought the approach of measuring the frequency of certain AI wording "ticks" made a lot of sense (some of the adjectives GPT4 uses, for example, are clear tells).

I'm curious to read the comments on this one! I have a much more detailed summary available here as well if you're interested, and the original paper is here.


r/MachineLearning May 22 '24

Discussion [D] AI Agents: too early, too expensive, too unreliable

339 Upvotes

Reference: Full blog post

There has been a lot of hype about the promise of autonomous agent-based LLM workflows. By now, all major LLMs are capable of interacting with external tools and functions, letting the LLM perform sequences of tasks automatically.

But reality is proving more challenging than anticipated.

The WebArena leaderboard, which benchmarks LLMs agents against real-world tasks, shows that even the best-performing models have a success rate of only 35.8%.

Challenges in Practice

After seeing many attempts to AI agents, I believe it's too early, too expensive, too slow, too unreliable.
It feels like many AI agent startups are waiting for a model breakthrough that will start the race to productize agents.

  • Reliability: As we all know, LLMs are prone to hallucinations and inconsistencies. Chaining multiple AI steps compounds these issues, especially for tasks requiring exact outputs.
  • Performance and costs: GPT-4o, Gemini-1.5, and Claude Opus are working quite well with tool usage/function calling, but they are still slow and expensive, particularly if you need to do loops and automatic retries.
  • Legal concerns: Companies may be held liable for the mistakes of their agents. A recent example is Air Canada being ordered to pay a customer who was misled by the airline's chatbot.
  • User trust: The "black box" nature of AI agents and stories like the above makes it hard for users to understand and trust their outputs. Gaining user trust for sensitive tasks involving payments or personal information will be hard (paying bills, shopping, etc.).

Real-World Attempts

Several startups are tackling the AI agent space, but most are still experimental or invite-only:

  • adept.ai - $350M funding, but access is still very limited
  • MultiOn - funding unknown, their API-first approach seems promising
  • HypeWrite - $2.8M funding, started with an AI writing assistant and expanded into the agent space
  • minion.ai - created some initial buzz but has gone quiet now, waitlist only

Only MultiOn seems to be pursuing the "give it instructions and watch it go" approach, which is more in line with the promise of AI agents.
All others are going down the record-and-replay RPA route, which may be necessary for reliability at this stage.

Large players are also bringing AI capabilities to desktops and browsers, and it looks like we'll get native AI integrations on a system level:

Screenshot Screenshot

These tech demos are impressive, but we'll see how well these agent capabilities will work when released publicly and tested against real-world scenarios instead of hand-picked demo cases.

The Path Forward

AI agents overhyped and it's too early.
However, the underlying models continue to advance quickly, and we can expect to see more successful real-world applications.
Instead of trying to have one large general purpose agent that is hard to control and test, we can use many smaller agents that basically just pick the right strategy for a specific sub-task in our workflows. These "agents" can be thought of as medium-sized LLM prompts with a) context and b) a set of functions available to call.

The most promising path forward likely looks like this:

  1. Narrowly scoped, well testable automations that use AI as an augmentation tool rather than pursuing full autonomy
  2. Human-in-the-loop approaches that keep humans involved for oversight and handling edge cases
  3. Setting realistic expectations about current capabilities and limitations

By combining tightly constrained agents, good evaluation data, human-in-the-loop oversight, and traditional engineering methods, we can achieve reliably good results for automating medium-complex tasks.

Will AI agents automate tedious repetitive work, such as web scraping, form filling, and data entry? Yes, absolutely.

Will AI agents autonomously book your vacation without your intervention? Unlikely, at least in the near future.


r/MachineLearning May 06 '24

Discussion [D] Kolmogorov-Arnold Network is just an MLP

319 Upvotes

It turns out, that you can write Kolmogorov-Arnold Network as an MLP, with some repeats and shift before ReLU.

https://colab.research.google.com/drive/1v3AHz5J3gk-vu4biESubJdOsUheycJNz


r/MachineLearning Aug 01 '24

Discussion [D] LLMs aren't interesting, anyone else?

308 Upvotes

I'm not an ML researcher. When I think of cool ML research what comes to mind is stuff like OpenAI Five, or AlphaFold. Nowadays the buzz is around LLMs and scaling transformers, and while there's absolutely some research and optimization to be done in that area, it's just not as interesting to me as the other fields. For me, the interesting part of ML is training models end-to-end for your use case, but SOTA LLMs these days can be steered to handle a lot of use cases. Good data + lots of compute = decent model. That's it?

I'd probably be a lot more interested if I could train these models with a fraction of the compute, but doing this is unreasonable. Those without compute are limited to fine-tuning or prompt engineering, and the SWE in me just finds this boring. Is most of the field really putting their efforts into next-token predictors?

Obviously LLMs are disruptive, and have already changed a lot, but from a research perspective, they just aren't interesting to me. Anyone else feel this way? For those who were attracted to the field because of non-LLM related stuff, how do you feel about it? Do you wish that LLM hype would die down so focus could shift towards other research? Those who do research outside of the current trend: how do you deal with all of the noise?


r/MachineLearning Aug 19 '24

Project [P] Illustrated book to learn about Transformers & LLMs

308 Upvotes

I have seen several instances of folks on this subreddit being interested in long-form explanations of the inner workings of Transformers & LLMs.

This is a gap my twin brother and I have been aiming at filling for the past 3 1/2 years. Last week, we published “Super Study Guide: Transformers & Large Language Models”, a 250-page book with more than 600 illustrations aimed at visual learners who have a strong interest in getting into the field.

This book covers the following topics in depth:

  • Foundations: primer on neural networks and important deep learning concepts for training and evaluation.
  • Embeddings: tokenization algorithms, word embeddings (word2vec) and sentence embeddings (RNN, LSTM, GRU).
  • Transformers: motivation behind its self-attention mechanism, detailed overview on the encoder-decoder architecture and related variations such as BERT, GPT and T5, along with tips and tricks on how to speed up computations.
  • Large language models: main techniques to tune Transformer-based models, such as prompt engineering, (parameter efficient) finetuning and preference tuning.
  • Applications: most common problems including sentiment extraction, machine translation, retrieval-augmented generation and many more.

(In case you are wondering: this content follows the same vibe as the Stanford illustrated study guides we had shared on this subreddit 5-6 years ago about CS 229: Machine Learning, CS 230: Deep Learning and CS 221: Artificial Intelligence)

Happy learning!


r/MachineLearning Sep 08 '24

Project [P]: TensorHue – a tensor visualization library (info in comments)

Thumbnail
gallery
289 Upvotes

r/MachineLearning Mar 27 '24

News [N] Introducing DBRX: A New Standard for Open LLM

291 Upvotes

https://x.com/vitaliychiley/status/1772958872891752868?s=20

Shill disclaimer: I was the pretraining lead for the project

DBRX deets:

  • 16 Experts (12B params per single expert; top_k=4 routing)
  • 36B active params (132B total params)
  • trained for 12T tokens
  • 32k sequence length training

r/MachineLearning Mar 20 '24

Discussion [D] Deanonymized paper accepted at ICLR 2024

288 Upvotes

I want to start a discussion about a particular case at this year's ICLR that I have observed.

- A paper is submitted to ICLR double-blind review with the full name of the first author and full acknowledgements (with several high-profile references) in the main paper body, right above the bibliography. The authors explicitly confirm the false statement "No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review." at submission time.

- The four reviewers and the AC avoid mentioning this and review the paper with the bias of knowing its authors, as if it does not violate the basic submission rules listed in the call for papers.

- The decision released for the paper mid-January is "Desk Reject" (presumably due to the above violation, confirming that the PCs are aware of it). This is not publicly archived.

- This is changed early February to "Oral" with no further justification, for unclear reasons.

An anonymity rule violation is first ignored by the reviewers and then silently allowed via an exception by the program committee, presumably after a direct complaint. In my opinion, this should be unacceptable for any conference, but especially a top conference in the field like ICLR. Why do we have the double-blind process if we do not enforce it and allow some authors to opt for revealing their names and bias the reviewers? Would the same exception be made if the authors were not prominent, or if they do research in a less privileged place, or come from a marginalized community? What about other desk rejected papers at this year's ICLR that did not get the opportunity for an exception?

The issues around bias and integrity in academia are in my experience often raised in private conversations but almost never openly discussed, so I am interested in hearing what the community thinks of this case. The program chairs have declined to comment on the final decision.


r/MachineLearning Jul 21 '24

Project [P] ChessGPT, 100,000x smaller than GPT-4, plays chess at 1500 Elo. By finding a skill vector, we can increase its win rate by 2.6x in out-of-distribution games.

286 Upvotes

A previous project trained ChessGPT, a set of 25M and 50M parameter GPT models that can play chess at 1500 Elo. These models are ~100,000x smaller than GPT-4's 1.8T parameters.

At Stockfish level 0, the 50M parameter model has a win rate of 70%. However, if the game is initialized with 20 random moves, its win rate drops to 17%. Is this because it can't generalize out of distribution? When considering the task of next-token prediction, a good next token predictor would predict legal but low skill moves if the game begins with random moves.

This is what we find with ChessGPT. By adding a skill vector to the model's activations, we can increase its win rate to 43%, or by 2.6x. We don't fully recover the performance gap, but it is a significant fraction. The intervention is very simple, and it's possible that a more sophisticated intervention could further increase its win rate.

This model is only trained to predict the next character in PGN strings (1.e4 e5 2.Nf3 …) and is never explicitly given the state of the board or the rules of chess. Despite this, in order to better predict the next character, it learns to compute the state of the board at any point of the game, and learns a diverse set of rules, including check, checkmate, castling, en passant, promotion, pinned pieces, etc. In addition, to better predict the next character it also learns to estimate latent variables such as the Elo rating of the players in the game.

We can also use interpretability methods to intervene on the model's internal board state.

This work was recently accepted to the 2024 Conference on Language Modeling (COLM) under the title "Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models".

More information is available in this post:

https://adamkarvonen.github.io/machine_learning/2024/03/20/chess-gpt-interventions.html

And the code is here: https://github.com/adamkarvonen/chess_llm_interpretability


r/MachineLearning Mar 28 '24

Research The end of hallucination (for those who can afford it)? [R]

273 Upvotes

DeepMind just published a paper about fact-checking text:

The approach costs $0.19 per model response, using GPT-3.5-Turbo, which is cheaper than human annotators, while being more accurate than them:

They use this approach to create a factuality benchmark and compare some popular LLMs.

Paper and code: https://arxiv.org/abs/2403.18802

EDIT: Regarding the title of the post: Hallucination is defined (in Wikipedia) as "a response generated by AI which contains false or misleading information presented as fact.": Your code that does not compile is not, by itself, a hallucination. When you claim that the code is perfect, that's a hallucination.