IBM Partnership Yields ‘Big Data’ Class
Computer science department offers a new cutting‐edge “big data” analysis class.
Traditional database experts are great at answering straightforward questions. Are sales up this quarter? Is the firm profitable? But they’re ill‐equipped to handle more open‐ended questions. How can you spot a fraudulent insurance claim? Where is the best location for a windmill?
These questions can be answered with enough data, but as anyone who remembers floppy disks can attest, the ability to store lots of data used to be a major constraint of technology. However, in the age of monster hard drives and cloud computing, that constraint is all but gone. The data are now available. The challenge is how to use them.
That’s where Yinghui “Susan” Zeng comes in. A database manager for IBM, Zeng is an expert in “big data” analytics. Big data is the shorthand term given to the trove of information constantly being produced by Internet users, companies, researchers — the entire wired world. Through a partnership between IBM and the College of Engineering’s Department of Computer Science, this fall at MU Zeng is teaching a new big data analysis class.
This field is expanding rapidly, outstripping the supply of qualified college graduates to work in it, which is why IBM supports Zeng, also an adjunct faculty member in the computer science department, in teaching the class.
“I’m happy to be back, attached to MU again,” says Zeng, MS ’94, PhD ’04.
IBM does big data consulting for companies. Zeng says it has helped a Canadian insurance company identify fraudulent insurance claims by analyzing reams of claim forms. Dong Xu, chair of MU’s computer science department, says President Barack Obama’s 2012 re‐election campaign used big data to micro‐target likely supporters, and hospitals collect infants’ vital signs and use big data techniques to predict diseases earlier in children.
But 80 percent of big data is unstructured, Zeng says, meaning it doesn’t fit well in the columns and rows of spreadsheets. It’s information like social media posts, genomic data and video footage — all very useful, but difficult to analyze.
Zeng has made her class hands‐on, teaching students learning the technical side of setting up distributed filing systems and parallel processors and learning a new programming language, but she will also teach new data analysis techniques.
“This course is really helpful to our students because they will learn cutting‐edge technology, which can help them land high‐paid jobs and cutting‐edge jobs,” Xu says.