Harmonix keeps innovating, with lasting impact

Every holiday season, a popular new video game causes a disproportionate amount of hype, anticipation, and last-minute shopping. But few of those games offer an entirely new way to play. Even fewer have ripple effects that reach far beyond the gaming universe.

When Guitar Hero was released in 2005, challenging players to hit notes to classic rock songs on guitar-like controllers, it grew from a holiday hit to a cultural phenomenon that taught a new generation to love rock ‘n’ roll music. Along the way, it showed the video game industry the power of innovative, music-based games.

Guitar Hero and the related Rock Band franchise were developed by Harmonix Music Systems, which formed more than 25 years ago in MIT’s Media Lab when a pair of friends began using technology to help people interact with music. Since then, it has released more than a dozen games that have helped millions of people experience the thrill of making music.

“The thing that we’ve always tried to accomplish is to innovate in music gameplay,” says Eran Egozy ’93, SM ’95, a professor of the practice in music and theater arts at MIT who co-founded the company with Alex Rigopulos ’92, SM ’94. “That’s what the company is constantly trying to do — creating new kinds of compelling music experiences.”

To further that mission, Harmonix became a part of industry giant Epic Games last month. It’s a major milestone for a company that has watched its games go from small passion projects to ubiquitous sources of expression and fun.

Egozy has seen Harmonix games on famous bands’ tour buses, in the offices of tech giants like Google, at bars hosting “Rock Band nights,” and being portrayed in popular TV shows. Most importantly, he’s heard from music teachers who say the games inspired kids to play real instruments.

In fact, Egozy just heard from his son’s school principal that the reason he plays the drums is because of Rock Band.

“That’s probably the most gratifying part,” says Egozy, who plays the clarinet professionally. “Of course, we had great hopes and aspirations when we started the company, but we didn’t think we would actually make such a big impact. We’ve been totally surprised.”

Mission-driven beginnings

As an undergraduate at MIT, Egozy majored in electrical engineering and computer science and minored in music. But he never thought about combining computers and music until he participated in the Undergraduate Research Opportunities Program under then-graduate student Michael Hawley in the Media Lab.

The experience inspired Egozy to pursue his master’s degree at the Media Lab’s Opera of the Future group, led by Tod Machover, where he began building software that generated music based on intuitive controls. He also met Rigopulos at the Media Lab, who quickly became a friend and collaborator.

“Alex had this idea: Wouldn’t it be cool if we took a joystick that’s a more friendly interface and used it to drive the parameters of our generative music system?” Egozy recalls.

The joystick-based system immediately became one of the most popular demos at the Media Lab, leading the pair to participate in the MIT $10K Entrepreneurship Competition (the MIT $100K today).

“I think MIT imbued me with a sense that there’s no point in trying to do something that someone’s already done,” Egozy says. “If you’re going to work on something, try to do something inventive. That’s a pervasive attitude all around MIT, not just at the Media Lab.”

As graduation arrived, Egozy and Rigopulos knew they wanted to continue working on the system, but they doubted they could find a company that would pay them to do it. Out of that simple logic, Harmonix was born.

The founders spent the next four years working on the technology, which led to a product called Axe that Egozy describes as a “total flop.” They also built a system for Disney at the Epcot amusement park and tried to integrate their software with karaoke machines in Japan.

“We sustained multiple failures trying to figure out what our business was, and it took us quite a while to discover the way to satisfy our mission, which is to let everyone in the world experience the joy of making music. As it turns out, that was through video games,” Egozy says.

The company’s first several video games were not huge hits, but by iterating on the core platform, Harmonix was able to steadily improve on the design and gameplay.

As a result, when it came time to make Guitar Hero around 2005, the founders had music, graphics, and design systems they knew could work with unique controllers.

Egozy describes Guitar Hero as a relatively low-budget project within Harmonix. The company had two games in development at the time, and the Guitar Hero team was the smaller one. It was also a quick turnaround: They finished Guitar Hero in about nine months.

Through its other releases, the Harmonix team had been trained to expect most of its sales to come in the weeks leading up to the Christmas holiday and then for sales to essentially stop. With Guitar Hero, the game sold incredibly quickly — so quickly that retailers immediately wanted more, and the company making the guitar controllers had to multiply their orders with manufacturers.

But what really surprised the founders was that January’s sales surpassed December’s. … Then February’s surpassed January’s. In fact, month after month, the sales graph looked like nothing Harmonix’s team of 45 people had ever seen before.

“It was mostly shock and disbelief within Harmonix,” Egozy says. “We just adored making Guitar Hero. It was the game we always wanted to make. Everyone at Harmonix was somehow involved in music. The company had a band room just so people could go and jam. And so the fact that it also sold really well was extremely gratifying — and very unexpected.”

Things moved quickly for Harmonix after that. Work on Guitar Hero 2 began immediately. Guitar Hero got taken over by Activision, and Harmonix was acquired by MTV Networks for a number of years. Harmonix went on to develop the Rock Band franchise, which brought players together to perform the lead guitar, bass, keyboard, drums, and vocals of popular songs.

“That was really wonderful because it was about a group effort,” Egozy says. “Rock Band was social in the sense that everyone’s together in the same room playing music together, not competitively, but working toward a common goal.”

An ongoing legacy

Over the last decade, Harmonix has continued to explore new modes of music gameplay with releases such as SingSpace, which offers a social karaoke experience, and Fuser, a DJ-inspired game that lets users mix and match different tracks. The company also released Rock Band VR, which makes players feel like they’re on stage in front of a live audience.

These days Egozy, who’s been on the board since he became a full-time professor at MIT in 2014, teaches 21M.385/6.185 (Interactive Music Systems), a class that combines computer science, interaction design, and music. “It’s the class I wish I had as an undergrad here at MIT,” Egozy says.

And every semester, the class takes a tour of the Harmonix office. He’s often told it’s students’ favorite part of class.

“I’m really proud of what we were able to do, and I’m still surprised and humbled by the cultural impact we had,” Egozy says. “There is a generation of kids that grew up playing these games that learned about all this music from the ’70s and ’80s. I’m really happy we were able to expose kids to all that great music.”

Generating a realistic 3D world

While standing in a kitchen, you push some metal bowls across the counter into the sink with a clang, and drape a towel over the back of a chair. In another room, it sounds like some precariously stacked wooden blocks fell over, and there’s an epic toy car crash. These interactions with our environment are just some of what humans experience on a daily basis at home, but while this world may seem real, it isn’t.

A new study from researchers at MIT, the MIT-IBM Watson AI Lab, Harvard University, and Stanford University is enabling a rich virtual world, very much like stepping into “The Matrix.” Their platform, called ThreeDWorld (TDW), simulates high-fidelity audio and visual environments, both indoor and outdoor, and allows users, objects, and mobile agents to interact like they would in real life and according to the laws of physics. Object orientations, physical characteristics, and velocities are calculated and executed for fluids, soft bodies, and rigid objects as interactions occur, producing accurate collisions and impact sounds.

TDW is unique in that it is designed to be flexible and generalizable, generating synthetic photo-realistic scenes and audio rendering in real time, which can be compiled into audio-visual datasets, modified through interactions within the scene, and adapted for human and neural network learning and prediction tests. Different types of robotic agents and avatars can also be spawned within the controlled simulation to perform, say, task planning and execution. And using virtual reality (VR), human attention and play behavior within the space can provide real-world data, for example.

“We are trying to build a general-purpose simulation platform that mimics the interactive richness of the real world for a variety of AI applications,” says study lead author Chuang Gan, MIT-IBM Watson AI Lab research scientist.

Creating realistic virtual worlds with which to investigate human behaviors and train robots has been a dream of AI and cognitive science researchers. “Most of AI right now is based on supervised learning, which relies on huge datasets of human-annotated images or sounds,” says Josh McDermott, associate professor in the Department of Brain and Cognitive Sciences (BCS) and an MIT-IBM Watson AI Lab project lead. These descriptions are expensive to compile, creating a bottleneck for research. And for physical properties of objects, like mass, which isn’t always readily apparent to human observers, labels may not be available at all. A simulator like TDW skirts this problem by generating scenes where all the parameters and annotations are known. Many competing simulations were motivated by this concern but were designed for specific applications; through its flexibility, TDW is intended to enable many applications that are poorly suited to other platforms.

Another advantage of TDW, McDermott notes, is that it provides a controlled setting for understanding the learning process and facilitating the improvement of AI robots. Robotic systems, which rely on trial and error, can be taught in an environment where they cannot cause physical harm. In addition, “many of us are excited about the doors that these sorts of virtual worlds open for doing experiments on humans to understand human perception and cognition. There’s the possibility of creating these very rich sensory scenarios, where you still have total control and complete knowledge of what is happening in the environment.”

McDermott, Gan, and their colleagues are presenting this research at the conference on Neural Information Processing Systems (NeurIPS) in December.

Behind the framework

The work began as a collaboration between a group of MIT professors along with Stanford and IBM researchers, tethered by individual research interests into hearing, vision, cognition, and perceptual intelligence. TDW brought these together in one platform. “We were all interested in the idea of building a virtual world for the purpose of training AI systems that we could actually use as models of the brain,” says McDermott, who studies human and machine hearing. “So, we thought that this sort of environment, where you can have objects that will interact with each other and then render realistic sensory data from them, would be a valuable way to start to study that.”

To achieve this, the researchers built TDW on a video game platform called Unity3D Engine and committed to incorporating both visual and auditory data rendering without any animation. The simulation consists of two components: the build, which renders images, synthesizes audio, and runs physics simulations; and the controller, which is a Python-based interface where the user sends commands to the build. Researchers construct and populate a scene by pulling from an extensive 3D model library of objects, like furniture pieces, animals, and vehicles. These models respond accurately to lighting changes, and their material composition and orientation in the scene dictate their physical behaviors in the space. Dynamic lighting models accurately simulate scene illumination, causing shadows and dimming that correspond to the appropriate time of day and sun angle. The team has also created furnished virtual floor plans that researchers can fill with agents and avatars. To synthesize true-to-life audio, TDW uses generative models of impact sounds that are triggered by collisions or other object interactions within the simulation. TDW also simulates noise attenuation and reverberation in accordance with the geometry of the space and the objects in it.

Two physics engines in TDW power deformations and reactions between interacting objects — one for rigid bodies, and another for soft objects and fluids. TDW performs instantaneous calculations regarding mass, volume, and density, as well as any friction or other forces acting upon the materials. This allows machine learning models to learn about how objects with different physical properties would behave together.

Users, agents, and avatars can bring the scenes to life in several ways. A researcher could directly apply a force to an object through controller commands, which could literally set a virtual ball in motion. Avatars can be empowered to act or behave in a certain way within the space — e.g., with articulated limbs capable of performing task experiments. Lastly, VR head and handsets can allow users to interact with the virtual environment, potentially to generate human behavioral data that machine learning models could learn from.

Richer AI experiences

To trial and demonstrate TDW’s unique features, capabilities, and applications, the team ran a battery of tests comparing datasets generated by TDW and other virtual simulations. The team found that neural networks trained on scene image snapshots with randomly placed camera angles from TDW outperformed other simulations’ snapshots in image classification tests and neared that of systems trained on real-world images. The researchers also generated and trained a material classification model on audio clips of small objects dropping onto surfaces in TDW and asked it to identify the types of materials that were interacting. They found that TDW produced significant gains over its competitor. Additional object-drop testing with neural networks trained on TDW revealed that the combination of audio and vision together is the best way to identify the physical properties of objects, motivating further study of audio-visual integration.

TDW is proving particularly useful for designing and testing systems that understand how the physical events in a scene will evolve over time. This includes facilitating benchmarks of how well a model or algorithm makes physical predictions of, for instance, the stability of stacks of objects, or the motion of objects following a collision — humans learn many of these concepts as children, but many machines need to demonstrate this capacity to be useful in the real world. TDW has also enabled comparisons of human curiosity and prediction against those of machine agents designed to evaluate social interactions within different scenarios.

Gan points out that these applications are only the tip of the iceberg. By expanding the physical simulation capabilities of TDW to depict the real world more accurately, “we are trying to create new benchmarks to advance AI technologies, and to use these benchmarks to open up many new problems that until now have been difficult to study.”

The research team on the paper also includes MIT engineers Jeremy Schwartz and Seth Alter, who are instrumental to the operation of TDW; BCS professors James DiCarlo and Joshua Tenenbaum; graduate students Aidan Curtis and Martin Schrimpf; and former postdocs James Traer (now an assistant professor at the University of Iowa) and Jonas Kubilius PhD ‘08. Their colleagues are IBM director of the MIT-IBM Watson AI Lab David Cox; research software engineer Abhishek Bhandwalder; and research staff member Dan Gutfreund of IBM. Additional researchers co-authoring are Harvard University assistant professor Julian De Freitas; and from Stanford University, assistant professors Daniel L.K. Yamins (a TDW founder) and Nick Haber, postdoc Daniel M. Bear, and graduate students Megumi Sano, Kuno Kim, Elias Wang, Damian Mrowca, Kevin Feigelis, and Michael Lingelbach.

This research was supported by the MIT-IBM Watson AI Lab.

Jelena Vučković delivers 2021 Dresselhaus Lecture on inverse-designed photonics

As her topic for the 2021 Mildred S. Dresselhaus Lecture, Stanford University professor Jelena Vučković posed a question: Are computers better than humans in designing photonics?

Throughout her talk, presented on Nov. 15 in a hybrid format to more than 500 attendees, the Jensen Huang Professor in Global Leadership at Stanford’s School of Engineering offered multiple examples arguing that, yes, computer software can help identify better solutions than traditional methods, leading to smaller, more efficient devices, as well as entirely new functionalities.

Photonics, the science of guiding and manipulating light, is used in many applications such as optical interconnects, optical computing platforms for AI or quantum computing, augmented reality glasses, biosensors, medical imaging systems, and sensors in autonomous vehicles.

For all these applications, Vučković said, many optical components must be integrated on a chip that can fit into the footprint of your glasses or mobile device. Unfortunately, the problems with high-density photonic integration are several. Traditional photonic components are large, sensitive to fabrication errors and environmental factors such as variations in temperature, and are designed by manual tuning with few parameters. So, Vučković and her team asked, “How can we design better photonics?”

Her answer: photonics inverse design. In this process, scientists rely on sophisticated computational tools and modern computing platforms to discover optimal photonic solutions or device designs for a particular function. In this inverse process, the researcher first considers how he or she would like the photonic block to operate, then uses computer software to search the whole parameter space of possible solutions for the one that is optimal, within fabrication restraints.

From guiding light around corners to splitting colors of light in a compact footprint, Vučković presented several examples to prove this process works — using computer software to conduct physics-guided searches of numerous possibilities produces non-traditional solutions that increase efficiency and/or decrease the footprint of photonic devices.

Enabling new functionalities – high-energy physics

State-of-the-art particle accelerators, which use microwaves or radio frequency waves to propel charged particles, can be the size of a full city block; Stanford’s SLAC National Accelerator Lab, for example, is two miles long. Lower-energy accelerators, such as those used in medical radiation facilities, are not as large, but still take up an entire room, are expensive, and not very accessible. “If we could use a different spectrum of electromagnetic waves with shorter wavelengths to do the same function of accelerating particles,” said Vučković, “we should, in principle, be able to shrink the size of an accelerator.” The solution is not as simple as reducing the size of all the parts, as the electromagnetic building blocks won’t work for optical waves. Instead, Vučković and her team used the inverse design process to create new building blocks, and built a single-stage on-chip integrated laser-driver particle accelerator that is only 30 micrometers in length.

Applying inverse designed photonics to practical environments

Autonomous vehicles have a large lidar system on the roof housing mechanics that enable rotation of a beam to scan the environment. Vučković considers how this could be improved. “Can you make this system inside the footprint of a single chip, which would be just like another sensor in your car, and can it be inexpensive?” Through inverse design, her research group found optimal photonic structures to enable steering the beam with inexpensive lasers that are cheaper, and achieve 5 degrees of additional beam steering, than state-of-the-art systems.

Next up: scaling superconducting quantum processors onto a single diamond or silicon carbide chip. In this example, Vučković harkened back to the 2020 Dresselhaus Lecture delivered by Harvard Professor Evelyn Hu on leveraging defects at the nanoscale. By relying on impurities in these materials at low concentrations, naturally trapped atoms can be very useful for quantum applications. Vučković’s group is working on material developments and fabrication techniques that allow them to put these trapped atoms in desired positions with minimal defects.

“For many applications, letting computer software search for an optimal solution leads to better solutions than what you would design, or guess, based on your intuitions. And this process is material-agnostic, fully compatible with commercial foundries, and enables new functionalities,” said Vučković. “Even if you try to make something a little bit better than traditional solutions — smaller in a footprint or higher in efficiency — we can come up with multiple solutions that are equally good or better than what we knew before. We are relearning photonics and electromagnetics in this process.”

Honoring Mildred S. Dresselhaus and Gene Dresselhaus

Vučković was the third speaker to deliver the Dresselhaus Lecture, established in 2019 to honor the late MIT physics and electrical engineering professor Mildred Dresselhaus. This year, the lecture was also dedicated to Gene Dresselhaus, renowned physicist and Millie’s husband, who passed away in late September 2021.

Jing Kong, professor of electrical engineering and computer science at MIT, opened the lecture by reflecting on the Dresselhaus’s scientific achievements. Kong highlighted the American Physical Society Oliver E Buckley Condensed Matter Physics Prize — considered the most prestigious award granted within the field of condensed-matter physics — that was awarded to both Millie (2008) and Gene (2022). “Although they worked together on many important topics,” said Kong, “It’s remarkable that they received this award for separate research works. It is humbling for us to walk in their footsteps.”

The intersection of math, computers, and everything else

Shardul Chiplunkar, a senior in Course 18C (mathematics with computer science), entered MIT interested in computers, but soon he was trying everything from spinning fire to building firewalls. He dabbled in audio engineering and glass blowing, was a tenor for the MIT/Wellesley Toons a capella group, and learned to sail.

“When I was entering MIT, I thought I was just going to be interested in math and computer science, academics and research,” he says. “Now what I appreciate the most is the diversity of people and ideas.”

Academically, his focus is on the interface between people and programming. But his extracurriculars have helped him figure out his secondary goal, to be a sort of translator between the technical world and the professional users of software.

“I want to create better conceptual frameworks for explaining and understanding complex software systems, and to develop better tools and methodologies for large-scale professional software development, through fundamental research in the theory of programming languages and human-computer interaction,” he says.

It’s a role he was practically born to play. Raised in Silicon Valley just as the dot-com bubble was at its peak, he was drawn to computers at an early age. He was 8 when his family moved to Pune, India, for his father’s job as a networking software engineer. In Pune, his mother also worked as a translator, editor, and radio newscaster. Chiplunkar eventually could speak English, Hindi, French, and his native Marathi.

At school, he was active in math and coding competitions, and a friend introduced him to linguistic puzzles, which he recalls “were kind of like math.” He went on to excel in the Linguistics Olympiad, where secondary school students solve problems based on the scientific study of languages — linguistics.

Chiplunkar came to MIT to study what he calls “the perfect major,” course 18C. But as the child of a tech dad and a translator mom, it was perhaps inevitable that Chiplunkar would figure out how to combine the two subjects into a unique career trajectory.

While he was a natural at human languages, it was a Computer Science and Artificial Intelligence Laboratory  Undergraduate Research Opportunities Program that cemented his interest in researching programming languages. Under Professor Adam Chlipala, he developed a specification language for internet firewalls, and a formally verified compiler to convert such specifications into executable code, using correct-by-construction software synthesis and proof techniques.

“Suppose you want to block a certain website,” explains Chiplunkar. “You open up your firewall and enter the address of the website, how long you want to block it, and so on. You have some parameters in a made-up language that tells the firewall what code to run. But how do you know the firewall will translate that language into code without any mistakes? That was the essence of the project. I was trying to create a language to mathematically specify the behavior of firewalls, and to convert it into code and prove that the code will do what you want it to do. The software would come with a mathematically proven guarantee.”

He has also explored adjacent interests in probabilistic programming languages and program inference through cognitive science research, working under Professor Tobias Gerstenberg at Stanford University and later under Joshua Rule in the Tenenbaum lab in MIT’s Department of Brain and Cognitive Sciences.

“In regular programming languages, the basic data you deal with, the atoms, are fixed numbers,” says Chiplunkar. “But in probabilistic programming languages, you deal with probability distributions. Instead of the constant five, you might have a random variable whose average value is five, but every time you run the program it’s somewhere between zero and 10. It turns out you can compute with these probabilities, too — and it’s a more powerful way to produce a computer model of some aspects of human cognition. The language lets you express concepts that you couldn’t express otherwise.”

“A lot of the reasons I like computational cognitive science are the same reasons I like programming and human language,” he explains. “Human cognition can often be expressed in a representation that is like a programming language. It’s more of an abstract representation. We have no idea what actually happens in the brain, but the hypothesis is that at some level of abstraction, it’s a good model of how cognition works.”

Chiplunkar also hopes to bring an improved understanding of modern software systems into the public sphere, to empower tech-curious communities such as lawyers, policymakers, doctors, and educators. To aid in this quest, he’s taken courses at MIT on internet policy and copyright law, and avidly follows the work of digital rights and liberties activists. He believes that programmers need fundamentally new language and concepts to talk about the architecture of computer systems for broader societal purposes.

“I want us to be able to explain why a surgeon should trust a robotic surgery assistant, or how a law about data storage needs to be updated for modern systems,” he says. “I think that creating better conceptual languages for complex software is just as important as creating better practical tools. Because complex software is now so important in the world, I want the computing industry — and myself — to be better able to engage with a wider audience.”

3 Questions: Can we fix our flawed software?

Sometimes, software is just like us. It can be bloated, slow, and messy. Humans might see a doctor if these symptoms persist (maybe not for messiness), but rarely do we push a flawed software program to go see its developer time and time again. 

The answer to why our software is flawed is ensnared in a web of reliance on flashy hardware, limits of a “code-and-fix” approach, and inadequate design. MIT Professor Daniel Jackson, who is the associate director of MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), looked at the existing limitations to create a new framework to improve how our programs function. His theory of software design takes a human-centered approach that views an app as a collection of interacting concepts. “The Essence of Software,” Jackson’s new book, draws on his many years of software research, including designing Alloy, an open source language and analyzer for software modeling.

Q: Bugs. Security flaws. Design flaws. Has software always been bad?  

A: Software is actually better than it ever was. It’s just that the power and functionality of software has grown so rapidly that we haven’t always been able to keep up. And there are some software products (Apple Keynote, for example) that are close to perfect — easy to use, flexible, with almost no bugs. My book offers an approach that will empower everyone to make software that good.

Q: In your new book, “The Essence of Software,” you introduce a theory of software design that demonstrates how a software system “can be viewed as a collection of interacting concepts.” How does this overturn conventional wisdom?

A: First, conventional wisdom sees user experience primarily in the user interface — its layout, colors, labels, etc. Concept design goes deeper, to address the fundamental mechanisms that the programmer builds and the user experiences.

Second, most apps have large areas of overlapping functionality, but existing approaches don’t recognize that, and developers build the same pieces of functionality again and again as if they were new, without taking advantage of the fact they’ve been built many times before. Just think about how many social media apps have implemented up-voting or comments or favorites, for example. Concepts let you identify these reuse opportunities and take advantage of accumulated design wisdom.

Q: The year 2021 was one of the worst years for data breaches. Boeing 787s have to be rebooted every 51 minutes to prevent “several potentially catastrophic failure scenarios.” Can your approach help with these kinds of security and safety issues? 

A: A high proportion of security and safety issues come from a lack of clarity in the design. Concepts can help with that. More directly, concepts can ensure that users actually understand the effects of their actions, and we know that many disasters happen because users do the wrong thing. In the area of security, getting the user to do the wrong thing (such as granting access to someone who shouldn’t have access) is usually the easiest path to taking control of a system. So, if you can design an app to make it harder for users to do things they’ll regret, you can mitigate this problem.

In MIT visit, Dropbox CEO Drew Houston ’05 explores the accelerated shift to distributed work

When the cloud storage firm Dropbox decided to shut down its offices with the outbreak of the Covid-19 pandemic, co-founder and CEO Drew Houston ’05 had to send the company’s nearly 3,000 employees home and tell them they were not coming back to work anytime soon. “It felt like I was announcing a snow day or something.”

In the early days of the pandemic, Houston says that Dropbox reacted as many others did to ensure that employees were safe and customers were taken care of. “It’s surreal, there’s no playbook for running a global company in a pandemic over Zoom. For a lot of it we were just taking it as we go.”

Houston talked about his experience leading Dropbox through a public health crisis and how Covid-19 has accelerated a shift to distributed work in a fireside chat on Oct. 14 with Dan Huttenlocher, dean of the MIT Stephen A. Schwarzman College of Computing.

During the discussion, Houston also spoke about his $10 million gift to MIT, which will endow the first shared professorship between the MIT Schwarzman College of Computing and the MIT Sloan School of Management, as well as provide a catalyst startup fund for the college.

“The goal is to find ways to unlock more of our brainpower through a multidisciplinary approach between computing and management,” says Houston. “It’s often at the intersection of these disciplines where you can bring people together from different perspectives, where you can have really big unlocks. I think academia has a huge role to play [here], and I think MIT is super well-positioned to lead. So, I want to do anything I can to help with that.”

Virtual first

While the abrupt swing to remote work was unexpected, Houston says it was pretty clear that the entire way of working as we knew it was going to change indefinitely for knowledge workers. “There’s a silver lining in every crisis,” says Houston, noting that people have been using Dropbox for years to work more flexibly so it made sense for the company to lean in and become early adopters of a distributed work paradigm in which employees work in different physical locations.

Dropbox proceeded to redesign the work experience throughout the company, unveiling a “virtual first” working model in October 2020 in which remote work is the primary experience for all employees. Individual work spaces went by the wayside and offices located in areas with a high concentration of employees were converted into convening and collaborative spaces called Dropbox Studios for in-person work with teammates.

“There’s a lot we could say about Covid, but for me, the most significant thing is that we’ll look back at 2020 as the year we shifted permanently from working out of offices to primarily working out of screens. It’s a transition that’s been underway for a while, but Covid completely finished the swing,” says Houston.

Designing for the future workplace

Houston says the pandemic also prompted Dropbox to reevaluate its product line and begin thinking of ways to make improvements. “We’ve had this whole new way of working sort of forced on us. No one designed it; it just happened. Even tools like Zoom, Slack, and Dropbox were designed in and for the old world.”

Undergoing that process helped Dropbox gain clarity on where they could add value and led to the realization that they needed to get back to their roots. “In a lot of ways, what people need today in principle is the same thing they needed in the beginning — one place for all their stuff,” says Houston.

Dropbox reoriented its product roadmap to refocus efforts from syncing files to organizing cloud content. The company is focused on building toward this new direction with the release of new automation features that users can easily implement to better organize their uploaded content and find it quickly. Dropbox also recently announced the acquisition of Command E, a universal search and productivity company, to help accelerate its efforts in this space.

Houston views Dropbox as still evolving and sees many opportunities ahead in this new era of distributed work. “We need to design better tools and smarter systems. It’s not just the individual parts, but how they’re woven together.” He’s surprised by how little intelligence is actually integrated into current systems and believes that rapid advances in AI and machine learning will soon lead to a new generation of smart tools that will ultimately reshape the nature of work — “in the same way that we had a new generation of cloud tools revolutionize how we work and had all these advantages that we couldn’t imagine not having now.”

Founding roots

Houston famously turned his frustration with carrying USB drives and emailing files to himself into a demo for what became Dropbox.

After graduating from MIT in 2005 with a bachelor’s degree in electrical engineering and computer science, he teamed up with fellow classmate Arash Ferdowsi to found Dropbox in 2007 and led the company’s growth from a simple idea to a service used by 700 million people around the world today.

Houston credits MIT for preparing him well for his entrepreneurial journey, recalling that what surprised him most about his student experience was how much he learned outside the classroom. At the event, he stressed the importance of developing both sides of the brain to a select group of computer science and management students who were in attendance, and a broader live stream audience. “One thing you learn about starting a company is that the hardest problems are usually not technical problems; they’re people problems.” He says that he didn’t realize it at the time, but some of his first lessons in management were gained by taking on responsibilities in his fraternity and in various student organizations that evoked a sense of being “on the hook.”

As CEO, Houston has had a chance to look behind the curtain at how things happen and has come to appreciate that problems don’t solve themselves. While individual people can make a huge difference, he explains that many of the challenges the world faces right now are inherently multidisciplinary ones, which sparked his interest in the MIT Schwarzman College of Computing.

He says that the mindset embodied by the college to connect computing with other disciplines resonated and inspired him to initiate his biggest philanthropic effort to date sooner rather than later because “we don’t have that much time to address these problems.”

The reasons behind lithium-ion batteries’ rapid cost decline

Lithium-ion batteries, those marvels of lightweight power that have made possible today’s age of handheld electronics and electric vehicles, have plunged in cost since their introduction three decades ago at a rate similar to the drop in solar panel prices, as documented by a study published last March. But what brought about such an astonishing cost decline, of about 97 percent?

Some of the researchers behind that earlier study have now analyzed what accounted for the extraordinary savings. They found that by far the biggest factor was work on research and development, particularly in chemistry and materials science. This outweighed the gains achieved through economies of scale, though that turned out to be the second-largest category of reductions.

The new findings are being published today in the journal Energy and Environmental Science, in a paper by MIT postdoc Micah Ziegler, recent graduate student Juhyun Song PhD ’19, and Jessika Trancik, a professor in MIT’s Institute for Data, Systems and Society.

The findings could be useful for policymakers and planners to help guide spending priorities in order to continue the pathway toward ever-lower costs for this and other crucial energy storage technologies, according to Trancik. Their work suggests that there is still considerable room for further improvement in electrochemical battery technologies, she says.

The analysis required digging through a variety of sources, since much of the relevant information consists of closely held proprietary business data. “The data collection effort was extensive,” Ziegler says. “We looked at academic articles, industry and government reports, press releases, and specification sheets. We even looked at some legal filings that came out. We had to piece together data from many different sources to get a sense of what was happening.” He says they collected “about 15,000 qualitative and quantitative data points, across 1,000 individual records from approximately 280 references.”

Data from the earliest times are hardest to access and can have the greatest uncertainties, Trancik says, but by comparing different data sources from the same period they have attempted to account for these uncertainties.

Overall, she says, “we estimate that the majority of the cost decline, more than 50 percent, came from research-and-development-related activities.” That included both private sector and government-funded research and development, and “the vast majority” of that cost decline within that R&D category came from chemistry and materials research.

That was an interesting finding, she says, because “there were so many variables that people were working on through very different kinds of efforts,” including the design of the battery cells themselves, their manufacturing systems, supply chains, and so on. “The cost improvement emerged from a diverse set of efforts and many people, and not from the work of only a few individuals.”

The findings about the importance of investment in R&D were especially significant, Ziegler says, because much of this investment happened after lithium-ion battery technology was commercialized, a stage at which some analysts thought the research contribution would become less significant. Over roughly a 20-year period starting five years after the batteries’ introduction in the early 1990s, he says, “most of the cost reduction still came from R&D. The R&D contribution didn’t end when commercialization began. In fact, it was still the biggest contributor to cost reduction.”

The study took advantage of an analytical approach that Trancik and her team initially developed to analyze the similarly precipitous drop in costs of silicon solar panels over the last few decades. They also applied the approach to understand the rising costs of nuclear energy. “This is really getting at the fundamental mechanisms of technological change,” she says. “And we can also develop these models looking forward in time, which allows us to uncover the levers that people could use to improve the technology in the future.”

One advantage of the methodology Trancik and her colleagues have developed, she says, is that it helps to sort out the relative importance of different factors when many variables are changing all at once, which typically happens as a technology improves. “It’s not simply adding up the cost effects of these variables,” she says, “because many of these variables affect many different cost components. There’s this kind of intricate web of dependencies.” But the team’s methodology makes it possible to “look at how that overall cost change can be attributed to those variables, by essentially mapping out that network of dependencies,” she says.

This can help provide guidance on public spending, private investments, and other incentives. “What are all the things that different decision makers could do?” she asks. “What decisions do they have agency over so that they could improve the technology, which is important in the case of low-carbon technologies, where we’re looking for solutions to climate change and we have limited time and limited resources? The new approach allows us to potentially be a bit more intentional about where we make those investments of time and money.”

“This paper collects data available in a systematic way to determine changes in the cost components of lithium-ion batteries between 1990-1995 and 2010-2015,” says Laura Diaz Anadon, a professor of climate change policy at Cambridge University, who was not connected to this research. “This period was an important one in the history of the technology, and understanding the evolution of cost components lays the groundwork for future work on mechanisms and could help inform research efforts in other types of batteries.”

The research was supported by the Alfred P. Sloan Foundation, the Environmental Defense Fund, and the MIT Technology and Policy Program.

Giving robots social skills

Robots can deliver food on a college campus and hit a hole-in-one on the golf course, but even the most sophisticated robot can’t perform basic social interactions that are critical to everyday human life.

MIT researchers have now incorporated certain social interactions into a framework for robotics, enabling machines to understand what it means to help or hinder one another, and to learn to perform these social behaviors on their own. In a simulated environment, a robot watches its companion, guesses what task it wants to accomplish, and then helps or hinders this other robot based on its own goals.

The researchers also showed that their model creates realistic and predictable social interactions. When they showed videos of these simulated robots interacting with one another to humans, the human viewers mostly agreed with the model about what type of social behavior was occurring.

Enabling robots to exhibit social skills could lead to smoother and more positive human-robot interactions. For instance, a robot in an assisted living facility could use these capabilities to help create a more caring environment for elderly individuals. The new model may also enable scientists to measure social interactions quantitatively, which could help psychologists study autism or analyze the effects of antidepressants.

“Robots will live in our world soon enough, and they really need to learn how to communicate with us on human terms. They need to understand when it is time for them to help and when it is time for them to see what they can do to prevent something from happening. This is very early work and we are barely scratching the surface, but I feel like this is the first very serious attempt for understanding what it means for humans and machines to interact socially,” says Boris Katz, principal research scientist and head of the InfoLab Group in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and a member of the Center for Brains, Minds, and Machines (CBMM).

Joining Katz on the paper are co-lead author Ravi Tejwani, a research assistant at CSAIL; co-lead author Yen-Ling Kuo, a CSAIL PhD student; Tianmin Shu, a postdoc in the Department of Brain and Cognitive Sciences; and senior author Andrei Barbu, a research scientist at CSAIL and CBMM. The research will be presented at the Conference on Robot Learning in November.

A social simulation

To study social interactions, the researchers created a simulated environment where robots pursue physical and social goals as they move around a two-dimensional grid.

A physical goal relates to the environment. For example, a robot’s physical goal might be to navigate to a tree at a certain point on the grid. A social goal involves guessing what another robot is trying to do and then acting based on that estimation, like helping another robot water the tree.

The researchers use their model to specify what a robot’s physical goals are, what its social goals are, and how much emphasis it should place on one over the other. The robot is rewarded for actions it takes that get it closer to accomplishing its goals. If a robot is trying to help its companion, it adjusts its reward to match that of the other robot; if it is trying to hinder, it adjusts its reward to be the opposite. The planner, an algorithm that decides which actions the robot should take, uses this continually updating reward to guide the robot to carry out a blend of physical and social goals.

“We have opened a new mathematical framework for how you model social interaction between two agents. If you are a robot, and you want to go to location X, and I am another robot and I see that you are trying to go to location X, I can cooperate by helping you get to location X faster. That might mean moving X closer to you, finding another better X, or taking whatever action you had to take at X. Our formulation allows the plan to discover the ‘how’; we specify the ‘what’ in terms of what social interactions mean mathematically,” says Tejwani.

Blending a robot’s physical and social goals is important to create realistic interactions, since humans who help one another have limits to how far they will go. For instance, a rational person likely wouldn’t just hand a stranger their wallet, Barbu says.

The researchers used this mathematical framework to define three types of robots. A level 0 robot has only physical goals and cannot reason socially. A level 1 robot has physical and social goals but assumes all other robots only have physical goals. Level 1 robots can take actions based on the physical goals of other robots, like helping and hindering. A level 2 robot assumes other robots have social and physical goals; these robots can take more sophisticated actions like joining in to help together.

Evaluating the model

To see how their model compared to human perspectives about social interactions, they created 98 different scenarios with robots at levels 0, 1, and 2. Twelve humans watched 196 video clips of the robots interacting, and then were asked to estimate the physical and social goals of those robots.

In most instances, their model agreed with what the humans thought about the social interactions that were occurring in each frame.

“We have this long-term interest, both to build computational models for robots, but also to dig deeper into the human aspects of this. We want to find out what features from these videos humans are using to understand social interactions. Can we make an objective test for your ability to recognize social interactions? Maybe there is a way to teach people to recognize these social interactions and improve their abilities. We are a long way from this, but even just being able to measure social interactions effectively is a big step forward,” Barbu says.

Toward greater sophistication

The researchers are working on developing a system with 3D agents in an environment that allows many more types of interactions, such as the manipulation of household objects. They are also planning to modify their model to include environments where actions can fail.

The researchers also want to incorporate a neural network-based robot planner into the model, which learns from experience and performs faster. Finally, they hope to run an experiment to collect data about the features humans use to determine if two robots are engaging in a social interaction.

“Hopefully, we will have a benchmark that allows all researchers to work on these social interactions and inspire the kinds of science and engineering advances we’ve seen in other areas such as object and action recognition,” Barbu says.

“I think this is a lovely application of structured reasoning to a complex yet urgent challenge,” says Tomer Ullman, assistant professor in the Department of Psychology at Harvard University and head of the Computation, Cognition, and Development Lab, who was not involved with this research. “Even young infants seem to understand social interactions like helping and hindering, but we don’t yet have machines that can perform this reasoning at anything like human-level flexibility. I believe models like the ones proposed in this work, that have agents thinking about the rewards of others and socially planning how best to thwart or support them, are a good step in the right direction.”

This research was supported by the Center for Brains, Minds, and Machines; the National Science Foundation; the MIT CSAIL Systems that Learn Initiative; the MIT-IBM Watson AI Lab; the DARPA Artificial Social Intelligence for Successful Teams program; the U.S. Air Force Research Laboratory; the U.S. Air Force Artificial Intelligence Accelerator; and the Office of Naval Research.

Toward speech recognition for uncommon spoken languages

Automated speech-recognition technology has become more common with the popularity of virtual assistants like Siri, but many of these systems only perform well with the most widely spoken of the world’s roughly 7,000 languages.

Because these systems largely don’t exist for less common languages, the millions of people who speak them are cut off from many technologies that rely on speech, from smart home devices to assistive technologies and translation services.

Recent advances have enabled machine learning models that can learn the world’s uncommon languages, which lack the large amount of transcribed speech needed to train algorithms. However, these solutions are often too complex and expensive to be applied widely.

Researchers at MIT and elsewhere have now tackled this problem by developing a simple technique that reduces the complexity of an advanced speech-learning model, enabling it to run more efficiently and achieve higher performance.

Their technique involves removing unnecessary parts of a common, but complex, speech recognition model and then making minor adjustments so it can recognize a specific language. Because only small tweaks are needed once the larger model is cut down to size, it is much less expensive and time-consuming to teach this model an uncommon language.

This work could help level the playing field and bring automatic speech-recognition systems to many areas of the world where they have yet to be deployed. The systems are important in some academic environments, where they can assist students who are blind or have low vision, and are also being used to improve efficiency in health care settings through medical transcription and in the legal field through court reporting. Automatic speech-recognition can also help users learn new languages and improve their pronunciation skills. This technology could even be used to transcribe and document rare languages that are in danger of vanishing.  

“This is an important problem to solve because we have amazing technology in natural language processing and speech recognition, but taking the research in this direction will help us scale the technology to many more underexplored languages in the world,” says Cheng-I Jeff Lai, a PhD student in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and first author of the paper.

Lai wrote the paper with fellow MIT PhD students Alexander H. Liu, Yi-Lun Liao, Sameer Khurana, and Yung-Sung Chuang; his advisor and senior author James Glass, senior research scientist and head of the Spoken Language Systems Group in CSAIL; MIT-IBM Watson AI Lab research scientists Yang Zhang, Shiyu Chang, and Kaizhi Qian; and David Cox, the IBM director of the MIT-IBM Watson AI Lab. The research will be presented at the Conference on Neural Information Processing Systems in December.

Learning speech from audio

The researchers studied a powerful neural network that has been pretrained to learn basic speech from raw audio, called Wave2vec 2.0.

A neural network is a series of algorithms that can learn to recognize patterns in data; modeled loosely off the human brain, neural networks are arranged into layers of interconnected nodes that process data inputs.

Wave2vec 2.0 is a self-supervised learning model, so it learns to recognize a spoken language after it is fed a large amount of unlabeled speech. The training process only requires a few minutes of transcribed speech. This opens the door for speech recognition of uncommon languages that lack large amounts of transcribed speech, like Wolof, which is spoken by 5 million people in West Africa.

However, the neural network has about 300 million individual connections, so it requires a massive amount of computing power to train on a specific language.

The researchers set out to improve the efficiency of this network by pruning it. Just like a gardener cuts off superfluous branches, neural network pruning involves removing connections that aren’t necessary for a specific task, in this case, learning a language. Lai and his collaborators wanted to see how the pruning process would affect this model’s speech recognition performance.

After pruning the full neural network to create a smaller subnetwork, they trained the subnetwork with a small amount of labeled Spanish speech and then again with French speech, a process called finetuning.  

“We would expect these two models to be very different because they are finetuned for different languages. But the surprising part is that if we prune these models, they will end up with highly similar pruning patterns. For French and Spanish, they have 97 percent overlap,” Lai says.

They ran experiments using 10 languages, from Romance languages like Italian and Spanish to languages that have completely different alphabets, like Russian and Mandarin. The results were the same — the finetuned models all had a very large overlap.

A simple solution

Drawing on that unique finding, they developed a simple technique to improve the efficiency and boost the performance of the neural network, called PARP (Prune, Adjust, and Re-Prune).

In the first step, a pretrained speech recognition neural network like Wave2vec 2.0 is pruned by removing unnecessary connections. Then in the second step, the resulting subnetwork is adjusted for a specific language, and then pruned again. During this second step, connections that had been removed are allowed to grow back if they are important for that particular language.

Because connections are allowed to grow back during the second step, the model only needs to be finetuned once, rather than over multiple iterations, which vastly reduces the amount of computing power required.

Testing the technique

The researchers put PARP to the test against other common pruning techniques and found that it outperformed them all for speech recognition. It was especially effective when there was only a very small amount of transcribed speech to train on.

They also showed that PARP can create one smaller subnetwork that can be finetuned for 10 languages at once, eliminating the need to prune separate subnetworks for each language, which could also reduce the expense and time required to train these models.

Moving forward, the researchers would like to apply PARP to text-to-speech models and also see how their technique could improve the efficiency of other deep learning networks.

“There are increasing needs to put large deep-learning models on edge devices. Having more efficient models allows these models to be squeezed onto more primitive systems, like cell phones. Speech technology is very important for cell phones, for instance, but having a smaller model does not necessarily mean it is computing faster. We need additional technology to bring about faster computation, so there is still a long way to go,” Zhang says.

Self-supervised learning (SSL) is changing the field of speech processing, so making SSL models smaller without degrading performance is a crucial research direction, says Hung-yi Lee, associate professor in the Department of Electrical Engineering and the Department of Computer Science and Information Engineering at National Taiwan University, who was not involved in this research.

“PARP trims the SSL models, and at the same time, surprisingly improves the recognition accuracy. Moreover, the paper shows there is a subnet in the SSL model, which is suitable for ASR tasks of many languages. This discovery will stimulate research on language/task agnostic network pruning. In other words, SSL models can be compressed while maintaining their performance on various tasks and languages,” he says.

This work is partially funded by the MIT-IBM Watson AI Lab and the 5k Language Learning Project.

3 Questions: Blending computing with other disciplines at MIT

The demand for computing-related training is at an all-time high. At MIT, there has been a remarkable tide of interest in computer science programs, with heavy enrollment from students studying everything from economics to life sciences eager to learn how computational techniques and methodologies can be used and applied within their primary field.

Launched in 2020, the Common Ground for Computing Education was created through the MIT Stephen A. Schwarzman College of Computing to meet the growing need for enhanced curricula that connect computer science and artificial intelligence with different domains. In order to advance this mission, the Common Ground is bringing experts across MIT together and facilitating collaborations among multiple departments to develop new classes and approaches that blend computing topics with other disciplines.

Dan Huttenlocher, dean of the MIT Schwarzman College of Computing, and the chairs of the Common Ground Standing Committee — Jeff Grossman, head of the Department of Materials Science and Engineering and the Morton and Claire Goulder and Family Professor of Environmental Systems; and Asu Ozdaglar, deputy dean of academics for the MIT Schwarzman College of Computing, head of the Department of Electrical Engineering and Computer Science, and the MathWorks Professor of Electrical Engineering and Computer Science — discuss here the objectives of the Common Ground, pilot subjects that are underway, and ways they’re engaging faculty to create new curricula for MIT’s class of “computing bilinguals.”

Q: What are the objectives of the Common Ground and how does it fit into the mission of the MIT Schwarzman College of Computing?

Huttenlocher: One of the core components of the college mission is to educate students who are fluent in both the “language” of computing and that of other disciplines. Machine learning classes, for example, attract a lot of students outside of electrical engineering and computer science (EECS) majors. These students are interested in machine learning for modeling within the context of their fields of interest, rather than inner workings of machine learning itself as taught in Course 6. So, we need new approaches to how we develop computing curricula in order to provide students with a thorough grounding in computing that is relevant to their interests, to not just enable them to use computational tools, but understand conceptually how they can be developed and applied in their primary field, whether it be science, engineering, humanities, business, or design.

The core goals of the Common Ground are to infuse computing education throughout MIT in a coordinated manner, as well as to serve as a platform for multi-departmental collaborations. All classes and curricula developed through the Common Ground are intended to be created and offered jointly by multiple academic departments to meet ‘common’ needs. We’re bringing the forefront of rapidly-changing computer science and artificial intelligence fields together with the problems and methods of other disciplines, so the process has to be collaborative. As much as computing is changing thinking in the disciplines, the disciplines are changing the way people develop new computing approaches. It can’t be a stand-alone effort — otherwise it won’t work.

Q: How is the Common Ground facilitating collaborations and engaging faculty across MIT to develop new curricula?

Grossman: The Common Ground Standing Committee was formed to oversee the activities of the Common Ground and is charged with evaluating how best to support and advance program objectives. There are 29 members on the committee — all are faculty experts in various computing areas, and they represent 18 academic departments across all five MIT schools and the college. The structure of the committee very much aligns with the mission of the Common Ground in that it draws from all parts of the Institute. Members are organized into subcommittees currently centered on three primary focus areas: fundamentals of computational science and engineering; fundamentals of programming/computational thinking; and machine learning, data science, and algorithms. The subcommittees, with extensive input from departments, framed prototypes for what Common Ground subjects would look like in each area, and a number of classes have already been piloted to date.

It has been wonderful working with colleagues from different departments. The level of commitment that everyone on the committee has put into this effort has truly been amazing to see, and I share their enthusiasm for pursuing opportunities in computing education.

Q: Can you tell us more about the subjects that are already underway?

Ozdaglar: So far, we have four offerings for students to choose from: in the fall, there’s Linear Algebra and Optimization with the Department of Mathematics and EECS, and Programming Skills and Computational Thinking in-Context with the Experimental Study Group and EECS; Modeling with Machine Learning: From Algorithms to Applications in the spring, with disciplinary modules developed by multiple engineering departments and MIT Supply Chain Management; and Introduction to Computational Science and Engineering during both semesters, which is a collaboration between the Department of Aeronautics and Astronautics and the Department of Mathematics. 

We have had students from a range of majors take these classes, including mechanical engineering, physics, chemical engineering, economics, and management, among others. The response has been very positive. It is very exciting to see MIT students having access to these unique offerings. Our goal is to enable them to frame disciplinary problems using a rich computational framework, which is one of the objectives of the Common Ground.

We are planning to expand Common Ground offerings in the years to come and welcome ideas for new subjects. Some ideas that we currently have in the works include classes on causal inference, creative programming, and data visualization with communication. In addition, this fall, we put out a call for proposals to develop new subjects. We invited instructors from all across the campus to submit ideas for pilot computing classes that are useful across a range of areas and support the educational mission of individual departments. The selected proposals will receive seed funding from the Common Ground to assist in the design, development, and staffing of new, broadly-applicable computing subjects and revision of existing subjects in alignment with the Common Ground’s objectives. We are looking explicitly to facilitate opportunities in which multiple departments would benefit from coordinated teaching.

Proudly powered by WordPress
Theme: Esquire by Matthew Buchanan.