<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Projects | DataFirst</title><link>https://ckids-datafirst.github.io/website/projects/</link><atom:link href="https://ckids-datafirst.github.io/website/projects/index.xml" rel="self" type="application/rss+xml"/><description>Projects</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Sun, 01 Jan 2023 00:00:00 +0000</lastBuildDate><image><url>https://ckids-datafirst.github.io/website/media/icon_hu5486d42984c30aaff6be99d37062b147_3155_512x512_fill_lanczos_center_3.png</url><title>Projects</title><link>https://ckids-datafirst.github.io/website/projects/</link></image><item><title>AI Ethics for Smart Health through Smart Watches</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-smart-watches/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-smart-watches/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Lots of personal data can be obtained from wearable devices such as smart watches. This data can be used to improve health, for example to learn to detect health problems and to check whether people adhere to doctor’s exercise recommendations. This project will conduct a thorough study of the ethical issues in using AI systems in this domain, with recommendations of how AI systems for smart health should be designed with ethical considerations in mind.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/clio-polanco-cercado">Clio Polanco Cercado&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jeremy-unger">Jeremy Unger&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kipauno-washington">Kipauno Washington&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/olivia-guo">Olivia Guo&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shruti-ramesh">Shruti Ramesh&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yabsera-benyam">Yabsera Benyam&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yuko-ohmori">Yuko Ohmori&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/yolanda-gil">Yolanda Gil&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>What kinds of health-related data can be captured through wearable devices, what kinds of analyses are possible, privacy and ethical aspects of personal applications for smart health.&lt;/p></description></item><item><title>AI/ML assisted fault detection in foundry processed devices</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-foundry-devices/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-foundry-devices/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Highly accurate fault detection in foundry produced microelectronics is crucial to ensuring quality of devices that leave the foundry. However, current defect detection flows are human-centric, which produces a bottleneck. The objective of this project is to leverage recent advances in AI/ML to develop automated techniques that can 1) identify manufacturing defects in microelectronics using imagery collected at the foundry, and 2) determine whether the identified defect will impact the performance of the manufactured component.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/bill-zhang">Bill Zhang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jiankun-wei">Jiankun Wei&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/lee-chi-wang">Lee-Chi Wang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ryan-lee">Ryan Lee&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/younwoo-roh">Younwoo Roh&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/andrew-rittenbach">Andrew Rittenbach&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/john-paul-walters">John Paul Walters&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>Students will learn about manufacturing defect detection algorithms, machine learning techniques, and microelectronics fabrication.&lt;/p></description></item><item><title>Analyzing Open Source Software Ecosystems</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-software-ecosystems/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-software-ecosystems/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Open source runs a lot of the world's critical software systems, but there is much that's unknown in how maintainers, developers and other parts of the software ecosystem function. Help us analyze a large corpus of open source data — both source code and patch conversations — to better understand them! We'll study things like rise to influence, authorship styles, malware analysis, topic modeling and social network analysis!&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/apoorv-dixit">Apoorv Dixit&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kai-zheng">Kai Zheng&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/zishen-wei">Zishen Wei&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/jeremy-abramson">Jeremy Abramson&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jim-blythe">Jim Blythe&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/alexey-tregubov">Alexey Tregubov&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>We'll touch on using LLMs to parse text messages and analyze code, graph databases, program analysis, and social network analysis among other skills&lt;/p></description></item><item><title>Application of AI, ML and NLP in understanding and preventing a serious aviation safety problem in the US - Runway Safety</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-aviation-safety/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-aviation-safety/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project, which is co-advised by Dr. Yolanda Gil, will use AI/ML/NLP to understand root-causes of one of the serious aviation safety problem in the US - runway incursions. The Aviation Safety Reporting Systems, which is administered by the NASA and is an untapped treasure trove of textual data, will be used for this project.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/abhinav-gupta">Abhinav Gupta&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/atharva-swami">Atharva Swami&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/chahita-verma">Chahita Verma&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/likhit-jha">Likhit Jha&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/mino-cha">Mino Cha&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ravneet-kaur">Ravneet Kaur&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shreyas-malewar">Shreyas Malewar&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/najmedin-meshkati">Najmedin Meshkati&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>Using AI/ML/NLP and working on the data from a major global industry - aviation.&lt;/p></description></item><item><title>Assessing the California Public Sector Job Market</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-labor-market/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-labor-market/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Public sector institutions at local, state, and federal levels are facing an unprecedented hiring crisis in competition for new talent. Yet there is no systematic understanding of the needs and openings across these levels of government to inform stakeholders such as universities, community colleges, and high schools on the current and emerging hiring trends in what constitutes approximately 15-20% of the entire labor market. In this project, students will develop algorithms that continuously scrap relevant job sites used by these governments to assess both developed and emerging hiring trends by aptitudes, professions, entry-levels, mobility, location, and other important attributes. In so doing, the project will inform researchers in public policy, public administration, political science, and labor economics as well as practitioners in government and associated stakeholders.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/darren-cao">Darren Cao&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/josephina-zhenni-bian">Josephina Zhenni Bian&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nicole-dias">Nicole Dias&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ryan-silva">Ryan Silva&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yi-ming">Yi Ming&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/william-resh">William Resh&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>Students will learn how to develop and organize labor market data to be used by practitioners and researchers through the construction of portal that can ably transform data into usable aggregated statistics and graphs.&lt;/p></description></item><item><title>Auditing web content promoting eating disorders</title><link>https://ckids-datafirst.github.io/website/projects/2023-spring/808/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-spring/808/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>The pandemic worsened eating disorders among adolescents, particularly girls. Eating disorders, a psychiatric illness where an adolescent tries to control their weight with severe food restriction (anorexia) or purging (bulimia), can be fatal. How much does web content and algorithms that power web search contribute to eating disorders? Imagine a girl who is unhappy about her weight and looks for dieting tips online. It does not take long for search algorithms to lead her to extreme content that promotes anorexia and extreme weight loss. The goal of this project is to audit the web for potentially harmful content and communities that promote eating disorders.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Interdisciplinary Data Science Teamwork&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/aryan-karnati">Aryan Karnati&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sudesh-kumar">Sudesh Kumar&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/siyi-chen">Siyi Chen&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shuchan-zhou">Shuchan Zhou&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/kristina-lerman">Kristina Lerman&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Statistics&lt;/li>
&lt;li>R&lt;/li>
&lt;li>Python&lt;/li>
&lt;li>NLP&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://auditing-drivers-for-eating-disorders.netlify.app/index.html" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Automated question type coding of forensic interviews</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-forensic/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-forensic/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Question type coding is used in research on forensic interviewing to distinguish between best practice open-ended questions, and closed-ended and leading questions that interviewers are trained to avoid. Most research teams in the field rely on a time-consuming and labor-intensive method of question type coding whereby a researcher codes every question in the interview, and a second researcher codes a subset to demonstrate inter-rater reliability. We are currently working with a graduate of the Masters in Computer Science program at USC on a project exploring automated question type coding of forensic interviews with victims of child abuse. In collaboration with the student, we have trained a large language model (RoBERTa) to distinguish between question types based on a rudimentary classification system. In the next stage of the project, we are aiming to finetune the model and use zero shot and few shot prompting to make distinctions for which there is limited manually-coded data.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/ashmika-gupte">Ashmika Gupte&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/hrishikesh-thakur">Hrishikesh Thakur&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sneha-chawan">Sneha Chawan&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/thomas-d-lyon">Thomas D. Lyon&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/zsofia-szojka">Zsofia Szojka&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>Students will learn to train and finetune large language models&lt;/p></description></item><item><title>Bad Writing is "Fine": Tuning an LLM to Suggest Improvements</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-bad-writing/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-bad-writing/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Prototype an approach to fine-tune a large language model (LLM) to help diagnose areas to improve a specific writing product. For example, scientific papers require consistent language but in creative writing variety matters. Proposed steps are:&lt;/p>
&lt;ol>
&lt;li>Writing Product: Coordinate with project mentors to choose a common and important writing product, such as a position paper or an academic conference. Identify/gather a rubric and a corpus.&lt;/li>
&lt;li>Inject Bad Writing: For each element of the rubric, develop prompts for generative AI to decrease the quality of writing based on the rubric (i.e., make it worse). This will form a training data set of the good example and version worse on certain characteristics.&lt;/li>
&lt;li>Fine Tune: Students will be expected to attempt to fine tune an LLM (e.g., LLAMA 2) based on this synthetically generated data&lt;/li>
&lt;li>Evaluate: Research if tuning suggests better domain-specific areas to improve.&lt;/li>
&lt;/ol>
&lt;p>This project aligns with ongoing work with the USC Generative AI Center.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/anupam-patil">Anupam Patil&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/anuranjan-pandey">Anuranjan Pandey&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/divyajyoti-panda">Divyajyoti Panda&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jize-luo">Jize Luo&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nilamadha-mohanty">Nilamadha Mohanty&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/rahul-tangsali">Rahul Tangsali&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/benjamin-nye">Benjamin Nye&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>Generative AI for large language models. Generating synthetic data for a rubric. Fine tuning a large language model, likely using CARC (the on campus computing cluster). Understanding intelligent tutoring system design fundamentals for modeling how experts diagnose issues from novices.&lt;/p></description></item><item><title>Build a multilingual decipherment system</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-decipherment/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-decipherment/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>We will build a working system that can decipher a letter substitution cipher into 14 languages and beyond, based on &lt;a href="https://aclanthology.org/2021.acl-long.561/" target="_blank" rel="noopener">https://aclanthology.org/2021.acl-long.561/&lt;/a> then apply it to languages it has never seen&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/aman-kumar">Aman Kumar&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/bowen-leng">bowen leng&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/emerson-jin">Emerson Jin&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/zijie-lei">Zijie Lei&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/jonathan-may">Jonathan May&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>read and understand an NLP paper, unusual applications of transformers, reproduction study&lt;/p></description></item><item><title>Building a Platform for NFL Data Insights</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-nfl/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-nfl/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Open source sports data such as the nflverse has lead to a massive increase in public sports analytics. But it's still hard to process, subset, visualize and analyze this data. This project will build a general-purpose analysis platform and dashboard, similar to what many teams use internally. Using the nflfastr data, this platform will allow interested individuals to select the play parameters they're interested in, and will provide relevant analysis, visualization and insight. Ideally, we'll set up the dashboard on the internet, and open source the project, allowing others to expand the available datasets, analyses and visualizations.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/abhinav-arun">Abhinav Arun&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/anish-ari">Anish Ari&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/brad-powell">Brad Powell&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/tyler-pomposelli">Tyler Pomposelli&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yitong-qian">Yitong Qian&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/jeremy-abramson">Jeremy Abramson&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>How to analyze and present insights from NFL play-by-play data&lt;/p></description></item><item><title>Determinants of spatial variations in broadband quality and prices</title><link>https://ckids-datafirst.github.io/website/projects/2023-spring/805/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-spring/805/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>There is anecdotal evidence that broadband service offers vary in quality and price along income and racial lines. This project seeks to validate these claims by collecting data about service speeds and prices for all known serviceable locations in Los Angeles County, and merging with sociodemographic variables from the Census Bureau and other sources. Other researchers have already developed prototype code for scraping data from ISP's websites, but the study only covered a small percentage of addresses in LA County. The analysis will probe for evidence of a &amp;quot;poverty penalty' whereby residents of poorer areas in Los Angeles are offered higher cost/lower quality broadband services.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Presentation&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/weiqian-zhang">Weiqian Zhang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/aurora-massari">Aurora Massari&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/angel-chavez-penate">Angel Chavez-Penate&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/zirong-huang">Zirong Huang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/mahika-mushuni">Mahika Mushuni&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/lei-cao">Lei Cao&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/hernan-galperin">Hernan Galperin&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>PyTorch&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://sites.google.com/usc.edu/determinants-of-spatial-var/home" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Does Municipal Broadband Deliver as Promised? An examination of broadband pricing and household adoption in areas served by muni networks.</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-broadband/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-broadband/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Broadband networks owned and/or operated by local governments (&amp;quot;muni networks&amp;quot;) are increasingly seen as a key tool to close the digital divide in Internet availability and adoption. There is however only anecdotal evidence about whether muni networks deliver on the promise of more affordable broadband in communities of little interest to traditional ISPs - typically disadvantaged communities. Taking advantage of the greater level of resolution in the new FCC broadband availability maps, this project will examine broadband pricing and adoption at the address level in areas served by muni networks, using a matched sample of comparable areas as a reference point. The goal of the project is to empirically assert whether muni networks are delivering on the promise of more affordable services, and whether this results in more household adoption than expected. The project is a component of an ongoing collaboration with digital equity advocacy organizations.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/menghan-jiao">Menghan Jiao&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ming-shan-lee">Ming Shan Lee&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yinuo-chen">Yinuo Chen&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/hernan-galperin">Hernan Galperin&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>Students will have the opportunity to apply data scraping, organization and analysis skills in the context of policy analysis&lt;/p></description></item><item><title>Event Forecasting using Efficient and Expressive Temporal Knowledge Graph</title><link>https://ckids-datafirst.github.io/website/projects/2023-spring/806/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-spring/806/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Temporal Knowledge Graph (TKG) models incorporate temporal aspects of facts into their graph neural networks (GNNs) learning processes to predict temporally conditioned facts. These models capture the temporal dynamics of the facts well and are well-suited for temporally conditioned graph completion tasks. However, there remain many open issues that need to be addressed to make it more practical for real-world applications: (1) Real-world problems usually do not conform to the graph completion tasks which fill out the missing element in a fact; (2) Sparse graph due to lack of temporal triples for target task leads to poor performance; (3) Temporal graphs are inherently dynamic entities that grow and change over time but most existing models require computationally expensive training from scratch to incorporate these changes. Thus, we propose to design an efficient event forecasting framework that solves such challenges.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/parth-rohilla">Parth Rohilla&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kiran-narahari">Kiran Narahari&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/hangji-he">Hangji He&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shubham-gujar">Shubham Gujar&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/varun-venkatesh">Varun Venkatesh&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/kian-ahrabian">Kian Ahrabian&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/dong-ho-lee">Dong-Ho Lee&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Web Crawling&lt;/li>
&lt;/ul></description></item><item><title>Federated Learning for Neuroscience</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-neuroscience/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-neuroscience/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Federated learning is an approach to distributed deep learning without sharing data. Multiple site train a neural network over private data. The parameters of the neural network are shared with a federation controller, but they are encrypted before sharing. Model aggregation is performed under fully homomorphic encryption. We propose to apply federated learning to several problems in neuroscience, such as predicting Alzheimer's, Parkinson's, epilepsy, and autism, possibly over multimodal data.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/aayushi-goenka">Aayushi Goenka&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/atharva-joshi">Atharva Joshi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/cooper-gamble">Cooper Gamble&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/dhruv-maheshwari">Dhruv Maheshwari&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/harsh-thakkar">Harsh Thakkar&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/hui-qi">Hui Qi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kartik-pandey">KARTIK PANDEY&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/prajwal-gupta">Prajwal Gupta&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/pratyush-bhatnagar">Pratyush Bhatnagar&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/rajeev-singh">Rajeev Singh&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/regan-wang">Regan Wang&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/jose-luis-ambite">Jose-Luis Ambite&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>Federated learning, machine learning for biomedical applications.&lt;/p></description></item><item><title>Human Bio-signals as a Function of Indoor Air Quality Control for Human Health in Buildings</title><link>https://ckids-datafirst.github.io/website/projects/2023-spring/807/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-spring/807/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>After the COVID, the remote work environment has become popular, and many commercial offices have tried to keep both the in-person work environment and the remote work environment while attempting to reduce the size of their workplaces. Even though hot desking systems are increasingly common and even starting to feel like a trend these days, there isn't much data to show how such systems can support occupants' environmental comfort, work productivity, and psychological stability while they are at work. Therefore, this project focuses on investigating how much the occupants are satisfied with their new workplace platform and how much a hotdesking system affects their work productivity, environmental satisfaction, etc. The project's findings will help design this new desking system in a way that will increase occupants' satisfaction with their surroundings and productivity at work without compromising their quality of life in the workplace.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yanli-zhang">Yanli Zhang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sowmiya-mathanagopalan">Sowmiya Mathanagopalan&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/joon-ho-choi">Joon-Ho Choi&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>R&lt;/li>
&lt;li>Python&lt;/li>
&lt;li>Matlab&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;/ul></description></item><item><title>Identifying Causal Pathways from Online to Offline Systems</title><link>https://ckids-datafirst.github.io/website/projects/2023-spring/804/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-spring/804/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Students will learn to collect and analyze data in the goal of identifying causal pathways between online and offline systems. Video games, present a rich set of opportunities for this analysis, and we will begin by studying them. We will start with the effect that video games have on culture, movement, and discussion offline. We will then transition to Reddit activity, where certain communities can spur offline action. Students will not only collect data, but apply state-of-the-art causal detection algorithms alongside PhD students studying the same phenomena.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yufei-wang">Yufei Wang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kaixin-guo">Kaixin Guo&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/emerson-jin">Emerson Jin&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/seoyoun-kim">Seoyoun Kim&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/haohan-wang">Haohan Wang&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/fred-morstatter">Fred Morstatter&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;/ul></description></item><item><title>Learning and forgetting in neural networks</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-nn-forgetting/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-nn-forgetting/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>In this project, you will examine the mechanism responsible for forgetting previous tasks in artificial neural networks. You will study how those mechanisms shape the behavior of neural network learning from heterogeneous data distributions. You will investigate how new information is stored in neural networks by plotting and interpreting the neuron activation patterns. You will also compare different learning schemas, and you will examine how they influence the final loss function landscape.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/anjali-singh">Anjali Singh&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/bhargav-krishnamurthy">Bhargav Krishnamurthy&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/karthik-kancharla">Karthik Kancharla&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/mounika-mukkamalla">Mounika Mukkamalla&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/pavan-rakesh-reddy-chirla">Pavan Rakesh Reddy Chirla&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/priyanka-yadav">Priyanka Yadav&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/saurabh-yadgire">Saurabh Yadgire&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yutong-luo">Yutong Luo&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/marcin-abram">Marcin Abram&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>How the information is stored in neural networks. How neural networks can forget how to perform previously mastered tasks. How to interpret neural networks (by examining the neuron activation patterns). How to conduct scientific experiments (in the domain of machine learning). How to present and visualize scientific data.&lt;/p></description></item><item><title>Natural language processing of safety reports in nuclear power plants</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-nuclear-safety/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-nuclear-safety/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project, which will be co-advised by Dr. Yolanda Gil, will use Natural Language Processing (NLP) techniques to analyze voluminous Diablo Canyon Independent Safety Committee (DCISC) annual reports to identify the role and contribution of &amp;quot;Traits of a Healthy Nuclear Safety Culture&amp;quot;, as defined by the Nuclear Regulatory Commission and the Institute of Nuclear Power Operations, in incident causation.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/adrianne-nguyen">Adrianne Nguyen&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/rahul-anil-nair">Rahul Anil Nair&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shourya-kothari">Shourya Kothari&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/tejaswi-chaudhari">Tejaswi Chaudhari&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/xiang-li">Xiang Li&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/najmedin-meshkati">Najmedin Meshkati&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>Application of NLP in real-world, working on very serious and important issues with global applications, which can be generalized and applied to other safety-sensitive technologies.&lt;/p></description></item><item><title>Natural language processing of safety reports in nuclear plants and aviation</title><link>https://ckids-datafirst.github.io/website/projects/2023-spring/810/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-spring/810/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project will focus on extracting concise structured information about incidents in nuclear plants and in aviation that are currently described in lengthy document reports. By structuring this information and mapping it to safety models and standards, we can help improve their operations.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Data Science Teamwork&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/anood-alkhatheeri">Anood Alkhatheeri&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/rushit-jain">Rushit Jain&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/samarth-saxena">Samarth Saxena&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/abishek-phalak">Abishek Phalak&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yassamin-neshatvar">Yassamin Neshatvar&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shruthakeerthy-srinivasan">Shruthakeerthy Srinivasan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/vaidehi-vatsaraj">Vaidehi Vatsaraj&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/siddarth-rudraraju">Siddarth Rudraraju&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/iris-gordo">Iris Gordo&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shelby-wu">Shelby Wu&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yolanda-gil">Yolanda Gil&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/najmedin-meshkati">Najmedin Meshkati&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Data Analysis&lt;/li>
&lt;li>SQL&lt;/li>
&lt;li>Clinical knowledge&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;li>Python&lt;/li>
&lt;li>AWS Sagemaker&lt;/li>
&lt;/ul></description></item><item><title>Networked social influence in large-scale networks</title><link>https://ckids-datafirst.github.io/website/projects/2023-spring/809/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-spring/809/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>We recently published a novel algorithm for measuring influence among people. What's different about it is that it doesn't rely on social media. It takes behavior data, such as might exist within a company's databases, and is able to say &amp;quot;&amp;quot;This person is causing this person to do X.&amp;quot;&amp;quot; This technique is broad and powerful and has many social and business applications. We're also fortunate to have a lot of data from corporations to explore, including over a 100m person dataset on charitable giving, several datasets of video game play, and one on commercial travel. What we need are smart students who can learn the algorithm and help us run it, answering questions of both scholarly and commercial interest. We need horsepower, and we're happy to help train, advise and add students to the many publications we see coming out of this effort.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/indrani-panchangam">Indrani Panchangam&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/can-jin">Can Jin&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yunning-chen">Yunning Chen&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/datt-patel">Datt Patel&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kaiyi-sun">Kaiyi Sun&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/lei-cao">Lei Cao&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nishi-doshi">Nishi Doshi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yingning-fan">Yingning Fan&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/aimei-yang">Aimei Yang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/dmitri-williams">Dmitri Williams&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Statistics&lt;/li>
&lt;li>R&lt;/li>
&lt;li>Python&lt;/li>
&lt;li>NLP&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;/ul></description></item><item><title>Predicting the possibility of escalation of care for specific cohorts admitted to ICU</title><link>https://ckids-datafirst.github.io/website/projects/2023-spring/803/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-spring/803/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>To predict the possibility of escalation of care for specific cohorts admitted to Keck ICU,), on the following:
• Increase in pressors
• Intubation
• Trip to operating room within 24 hours
• Starting dialysis
The objective is to predict on a real-time/near real-time basis, whether someone is likely to require escalated care as defined above. Prediction of the likelihood of such a care escalation need is assumed to be an optional/secondary requirement at this stage.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/ho-ko">Ho Ko&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/vatsal-khandor">Vatsal Khandor&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/neil-bahroos">Neil Bahroos&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Statistics&lt;/li>
&lt;li>R&lt;/li>
&lt;/ul></description></item><item><title>Pyleoclim: A Python Package for the Analysis of Paleoclimate Data</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-paleoclimate/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-paleoclimate/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Paleoclimate timeseries data are crucial to understand how climate has changed in the past. A major aspect of this work falls under exploratory analysis, and in particular, visualization. Pyleoclim contains many functionalities for timeseries analysis of paleoclimate data and has already been used in teaching and research settings. In the coming months, we are expanding several functionalities of the package to address growing community need: outlier detection, automated visualizations, automated checks for the validity of datasets loaded into the package. In addition, these new functionalities will be integrated into tutorials distributed through a Jupyter Book.&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/julien-emile-geay">Julien Emile-Geay&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/deborah-khider">Deborah Khider&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>Timeseries analysis, Python packaging, continuous integration, containerization, GitHub, Jupyter, Binder.&lt;/p></description></item><item><title>Regular Data: Quality health monitoring while you sit</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-health-monitoring/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-health-monitoring/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>To create a software-as-a-service data pipeline for collecting health biomarkers via an instrumented toilet seat. This clinical data management system (CDMS) will enable collecting clinical-grade, high-quality, curated, and consistent data capture that meets NIH and FDA standards of clinical utility.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/alexander-billups">Alexander Billups&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/dhanashree-anant-patil">Dhanashree Anant Patil&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/meghana-ramachandra-bhat">Meghana Ramachandra Bhat&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/snigdha-chenjeri">Snigdha Chenjeri&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/natalie-fung">Natalie Fung&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/francisco-valero-cuevas">Francisco Valero-Cuevas&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>Engineering and software architecture skills to create the data pipeline from instrument signals to health reports compatible with reimbursement, research and clinical data systems.&lt;/p></description></item><item><title>The value of player tracking technology in assessment of training volume in youth soccer players.</title><link>https://ckids-datafirst.github.io/website/projects/2023-spring/801/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-spring/801/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Wearable IMU and GPS sensors are used to estimate training volume in professional sports to ensure adequate preparation for competition. Recommendations to modify training is individualized to meet health and performance needs of each player. Expense associated with these systems restrict their use in youth sports. More economical alternatives are available but it is not clear if the data is comparable. This project will compare data collected concurrently to determine if the economical system provides similar conclusions. Players from the LA Galaxy (second and developmental teams) will participate. Continuous data from triaxial accelerometers, gyroscope and GPS will be collected during practice and competition. Raw data and commercially derived key performance indicators will be compared to understand the agreement between the systems. Acceptable agreement with comparable interpretation of data with respect to training recommendations may allow for greater access for young athletes&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/ajay-kc">Ajay Kc&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shardul-nazirkar">Shardul Nazirkar&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/susan-sigward">Susan Sigward&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;li>Econometrics&lt;/li>
&lt;/ul></description></item><item><title>Transition of Care</title><link>https://ckids-datafirst.github.io/website/projects/2023-spring/802/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-spring/802/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Analytics to determine what tertiary and quaternary care patients would have the best outcomes at Keck Medical Center. Analyze data from incoming transfer patients to look at data quality, evaluate clinical records and predict outcomes.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Website&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yan-zheng">Yan Zheng&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yuan-luo">Yuan Luo&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/neil-bahroos">Neil Bahroos&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Web Scraping&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://sites.google.com/usc.edu/keck-medicine-toc/home" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Understanding the Relation Between Noise and Bias in Annotated Datasets</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-noise-bias/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-noise-bias/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>When it comes to classification tasks, many previous work has tried to design larger and more complex neural networks. Recently, the line of data-centric AI has worked on shifting the focus to the quality of the train data. This shift arises from the recognition that the annotations associated with dataset instances can exhibit both noise, stemming from vague instructions or human errors, and bias, arising from differing perspectives among annotators in response to given prompts.
In this project, our objective is to bridge the gap between the two lines of research: one dedicated to identifying noisy instances and the other striving to account for the diverse perspectives of annotators. Specifically, we will delve into the domain of offensive text detection datasets, a highly subjective task. Our investigation will center on whether perspectivist classification models have effectively harnessed valuable information from instances flagged as noisy by noise-detection techniques.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/abhishek-anand">Abhishek Anand&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/anweasha-saha">Anweasha Saha&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/mohib-mirza">Mohib Mirza&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/prathyusha-naresh-kumar">Prathyusha Naresh Kumar&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/negar-mokhberian">Negar Mokhberian&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>The student will learn the importance of individual instances and individual annotations in training the classification models. Each of these datapoints can introduce either useful signal or noise to the model and the student will learn to recognize the difference.&lt;/p></description></item><item><title>Urban Futures Data Core</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-urban-futures/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-urban-futures/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Cities are the focal point of economic, social, and environmental challenges and opportunities. To establish USC as a thought leader and partner of choice to tackle the challenges of the urban future, the USC Sol Price School of Public Policy and the USC Marshall School of Business propose establishing an Urban Futures Data Core to serve as a university-wide hub for data analysis and dissemination. Students working on this project will work with all faculty at Price and Marshall to catalogue the publicly available, restricted-use, and self-collected datasets that USC researchers have previously used. They will then create a secure website to track each data source and its data use agreements, dates of availability, and geographic level of granularity. After a data website is constructed, students will have the opportunity to assist with creating geographic visualizations of key indices related to urban futures.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/andrew-bae">Andrew Bae&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/austin-zhang">Austin Zhang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/brian-tinsley">Brian Tinsley&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/chaitanya-priya-kakkehalli-jayaram">Chaitanya Priya Kakkehalli Jayaram&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ginny-barnes">Ginny Barnes&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/harman-pelia">Harman Pelia&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/pin-tzu-lee">Pin-Tzu Lee&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/rachita-jain">Rachita Jain&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sneh-shah">Sneh Shah&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/tejas-sheth">TEJAS SHETH&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yi-zheng">Yi Zheng&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/alice-chen">Alice Chen&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>The students will learn about all data sources used in public policy and business, data management, and web design.&lt;/p></description></item><item><title>Utilizing AI Generated Images for Object Detection and Classification</title><link>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-ai-images/</link><pubDate>Sun, 01 Jan 2023 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2023-fall/2023-fall-ai-images/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Developing image-based object detection and classification models requires significant time, resources, and effort. Especially, acquiring a good training dataset is essential. However, there are some cases when it is very hard to get quality data such as rare cases (e.g., disasters) or expensive cases to get (e.g., faraway places). Due to the development of generative AI, we might produce synthetic images to enhance the quality of dataset by filling up missing images with them. Based on our prior work in object detection and classification for smart city applications, we would like to explore the potential of AI generated images for an enhanced object detection and classification.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/amy-jiang">Amy Jiang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/dongwook-kim">Dongwook Kim&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/harshita-dooja-poojary">Harshita Dooja Poojary&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/michael-kingsley">Michael Kingsley&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nealson-setiawan">Nealson Setiawan&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/seon-ho-kim">Seon Ho Kim&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="what-students-will-learn">What students will learn&lt;/h2>
&lt;p>Image machine learning, object detection&lt;/p></description></item><item><title>Automatic Discovery of News Articles: Case of Policy Misconduct</title><link>https://ckids-datafirst.github.io/website/projects/2022-fall/701/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-fall/701/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>The PMR (Police Misconduct Registry) is a database of officers who have been terminated or have resigned in lieu of being fired for misconduct. The objective of PMR is to increase the public trust and legitimacy of law enforcement officers serving the community while also helping departments hire the best possible candidates. The PMR is continually populated with all instances of police misconduct anywhere in the United States.&lt;/p>
&lt;p>Currently, data entries are manually identified, discovered and registered using public, open-sourced information, mostly news articles on the web, which critically limits its data collection process. Thus, this project aims at automating the discovery of such data with an efficient identification mechanism. Working with the Price School of Policy, we will implement an automatic identification mechanism to effectively search police misconduct articles utilizing web crawling/scarping and natural language processing.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Presentation&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/deborah-ho">Deborah Ho&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jooyoung-yoo">Jooyoung Yoo&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/vanshay-gupta">Vanshay Gupta&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ikenna-joe-nweke">Ikenna Joe-Nweke&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/seon-ho-kim">Seon Ho Kim&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;/ul></description></item><item><title>Characterizing Online Attitudes, Expectations, and Concerns about Novel Medical Treatments</title><link>https://ckids-datafirst.github.io/website/projects/2022-spring/605/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-spring/605/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Novel, or hypothesized medical treatments, such as COVID-19 vaccines and male contraception, are regularly discussed on social media. For example, on the AskReddit subreddit, questions of the form “”Would you take [x] if it existed?”” Aside from willingness to use these novel treatments, the answers to these questions contain important clues to peoples’ latent concerns and barriers to adoption of novel medications. Understanding them can provide crucial information about how to introduce, communicate, and counsel about new medications when they come to market. In this project you will use pre-collected Reddit data spanning 10 years to answer questions including: What concerns do individuals have about a novel medication? How do these concerns vary by demographics, such as cultural background? How have these concerns evolved over time? What has caused users to become more or less accepting of the treatment over time?&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Presentation&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/fred-morstatter">Fred Morstatter&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;/ul></description></item><item><title>Characterizing the counter-narratives of climate change (Spring - 2022)</title><link>https://ckids-datafirst.github.io/website/projects/2022-spring/603/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-spring/603/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Top climate scientists post their findings and views regularly on social media. These very scientists are met with tweets from those with opposing views, often containing vitriolic and false information. It is important that we can identify and characterize these tweets to understand the counter-narratives of climate change. We will address topics including false information, bot campaigns, and harassment.&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/fred-morstatter">Fred Morstatter&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/deborah-khider">Deborah Khider&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;li>Classification&lt;/li>
&lt;li>Data Collection&lt;/li>
&lt;/ul></description></item><item><title>Community Economic Tool</title><link>https://ckids-datafirst.github.io/website/projects/2022-spring/606/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-spring/606/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Determining what makes a region “most attractive” for new business will involve more exploratory research in determining what variable(s) are most indicative of potential economic growth opportunities. Additional research will be conducted to identify various predictors of our indicator variable, such as unemployment rate and educational level of tract level residents, neighboring tract residents, and broadband accessibility. That is to say, “What does this region of Miami do?” What industries are the largest employers in each region and are they also the ones generating the most revenue? Once each region has been properly identified, one can identify the dominant predictors of economic growth within this region and compare to other tracts with the same dominant industry structure. This will highlight the role geographic regions play in economic development and growth of various industries.&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/palak-agarwal">Palak Agarwal&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;/ul></description></item><item><title>Decoding How Humans Encode Memories</title><link>https://ckids-datafirst.github.io/website/projects/2022-spring/602/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-spring/602/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Advancements in closed-loop deep brain stimulation (DBS) enabled more intelligent autonomy for therapeutic intervention across a wide range of neurologic and psychiatric disorders. The predominant approach relies on control-theoretic approximations of the brain’s complex functional relationships with the external environment–in particular, a mapping between targeted stimulation and naturalistic responses of different regions of the brain. However, existing approaches fail to capture the environmental context of neuronal biomarkers. Thus, we leverage a set of IoT sensors to capture the human experience and environmental context, i.e., a subset of human sensory channels, in order to estimate the state of the human brain and provide the foundation for smarter, context-dependent DBS. We explore neural-symbolic approaches that integrate the powerful perception capabilities of deep learning with human logic to reason about the complex dependencies across a heterogeneous set of sensors.&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/luis-garcia">Luis Garcia&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Deep Learning&lt;/li>
&lt;/ul></description></item><item><title>Decoding How Humans Encode Memories (Fall - 2022)</title><link>https://ckids-datafirst.github.io/website/projects/2022-fall/711/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-fall/711/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>In this project, we will work on semantically aligning IoT sensor data and neural data with the human experience. We have developed a sensor platform to record the human experience as patients perform navigational tasks. The goal is to understand what context shifts in the human experience anchor our episodic memories. Each student will have the opportunity to work with different sensing modalities, and develop models for both sensory perception and reasoning.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/helin-yilmaz">Helin Yilmaz&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ao-xu">Ao Xu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ketki-kinkar">Ketki Kinkar&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/saurabh-koshatwar">Saurabh Koshatwar&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/christian-bryan">Christian Bryan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ashley-sue">Ashley Sue&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/luis-garcia">Luis Garcia&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>PyTorch&lt;/li>
&lt;li>Tensorflow&lt;/li>
&lt;/ul></description></item><item><title>Hot desking system guarantee your productivity? : Investigation of a first-come-first-served workplace system focusing on the occupants’ work productivity and wellness</title><link>https://ckids-datafirst.github.io/website/projects/2022-fall/708/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-fall/708/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>After the COVID, the remote work environment has become popular, and many commercial offices have tried to keep both the in-person work environment and the remote work environment while attempting to reduce the size of their workplaces. Even though hot desking systems are increasingly common and even starting to feel like a trend these days, there isn't much data to show how such systems can support occupants' environmental comfort, work productivity, and psychological stability while they are at work. This project adopts a commercial office as a testbed, which is located in the downtown Los Angeles, and conducts questionnaire surveys, indoor environmental quality measurements. The project's findings will help design this new desking system in a way that will increase occupants' satisfaction with their surroundings and productivity at work without compromising their quality of life in the workplace.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/dhruv-goel">Dhruv Goel&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/indrani-panchangam">Indrani Panchangam&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/mingxuan-ma">Mingxuan Ma&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/siqi-qiao">Siqi Qiao&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/crystal-sheng">Crystal Sheng&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/winnie-hou">Winnie Hou&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/joon-ho-choi">Joon-Ho Choi&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Machine Learning&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;li>R&lt;/li>
&lt;li>WEKA&lt;/li>
&lt;/ul></description></item><item><title>Identification and characterization of cross-platform misinformation diffusion</title><link>https://ckids-datafirst.github.io/website/projects/2022-fall/709/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-fall/709/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Fringe communities are often the sources of conspiracy theories and extreme ideas. Niche online platforms hosting such communities represent suitable incubators for reinforcing questionable stories and to ultimately push them into the mainstream. Over the years, information pathways from fringe to mainstream media have significantly increased, enabling the proliferation of harmful content. This project aims at developing novel network- and AI-based models for identifying and characterizing information pathways that enable the proliferation of potentially harmful content on online media channels. Mainstream social media (Twitter, Facebook, and Instagram), video streaming platforms (YouTube and Bitchute), niche platforms (Gab, 4chan, and Parler), and messaging apps (Telegram) will be considered to investigate how harmful narratives flow across diverse platforms and predict those that will gain traction on mainstream media.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Cyberphysical Data Science&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Best Data Science Open and Sharing Practices&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/william-lu">William Lu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/zexun-yao">Zexun Yao&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/duyen-nguyen">Duyen Nguyen&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jinyi-ye">Jinyi Ye&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/aaron-tom">Aaron Tom&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/luca-luceri">Luca Luceri&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;li>Data Visualization&lt;/li>
&lt;li>Social Network Analysis&lt;/li>
&lt;/ul></description></item><item><title>Identification of Sustainability-Related Research at USC through Machine Learning and Keyword Mapping</title><link>https://ckids-datafirst.github.io/website/projects/2022-fall/705/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-fall/705/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Are you passionate about data science and sustainability? Then this interdisciplinary project is for you! Here, we will develop a machine learning program to identify USC research publications and grants as ‘sustainability-focused’, ‘sustainability-inclusive’ or ‘not-sustainability-related’ by using pre-categorized publication samples. In addition, we will use keyword lists that relate to the 17 UN Sustainable Development Goals (SDGs) to map all research groups at USC as they relate to these SDGs (&lt;a href="https://sdgs.un.org/goals%29" target="_blank" rel="noopener">https://sdgs.un.org/goals)&lt;/a>. Lastly, we will create an interactive dashboard in R Shiny that will act as a public directory of all research at USC with classification of the research by the SDGs and broader sustainability categorization. As an example, check our github for USC curriculum: &lt;a href="https://github.com/USC-Office-of-Sustainability/USC-SDGmap" target="_blank" rel="noopener">https://github.com/USC-Office-of-Sustainability/USC-SDGmap&lt;/a> . Your work on this project is critical in boosting sustainability-related research at USC and thereby achieving our Asgmt: Earth Research Goals.&amp;quot;&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Data Science Collaboration Practices&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Best Data Science Teamwork&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/ric-xian">Ric Xian&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/aurora-massari">Aurora Massari&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/xinyi-zhang">Xinyi Zhang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/bhavyakumar-ramani">Bhavyakumar Ramani&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/alison-chen">Alison Chen&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/julie-v-hopper">Julie V. Hopper&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>R&lt;/li>
&lt;li>Python&lt;/li>
&lt;li>Web Scraping&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://usc-office-of-sustainability.github.io/SustainabilityResearchFinder/" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Identifying Catalysts for Online Depolarization</title><link>https://ckids-datafirst.github.io/website/projects/2022-fall/712/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-fall/712/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>None&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/shengnan-ke">Shengnan Ke&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/xiao-zhou">Xiao Zhou&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/diana-pham">Diana Pham&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yitong-qian">Yitong Qian&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/kristina-lerman">Kristina Lerman&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Intelligent Analytics and Integration of Internet Memes</title><link>https://ckids-datafirst.github.io/website/projects/2022-fall/703/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-fall/703/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Internet memes are a popular tool for creatively expressing ideas on the Web. Machine understanding of memes would benefit researchers interested in trends, virality of topics, and hate speech. However, understanding memes is difficult, as it requires combining text, vision, and extensive background knowledge. Recently, we created an Internet Meme Knowledge Graph that contains rich information about thousands of entities. In this project, we will perform extensive profiling of the Internet Meme Knowledge Graph, enrich it with other sources, and store its knowledge into a centralized resource like Wikidata.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/mansi-kulhari">Mansi Kulhari&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/zongmian-huang">Zongmian Huang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ryan-gleason">Ryan Gleason&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/dhandeep-suglani">Dhandeep Suglani&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yunyi-fiona-zhang">Yunyi (Fiona) Zhang&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/filip-ilievski">Filip Ilievski&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/riccardo-tommasini">Riccardo Tommasini&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Knowledge Graphs&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;li>Wikidata&lt;/li>
&lt;li>APIs&lt;/li>
&lt;/ul></description></item><item><title>Knowledge-powered understanding of diet’s water footprint</title><link>https://ckids-datafirst.github.io/website/projects/2022-fall/702/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-fall/702/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Food production and their supply chains are the main cause of water pollution, especially in emerging and developed countries. Understanding the connection between our diet and water pollution is extremely challenging because it requires combining knowledge about food production, supply chains, and transportation. As more and more people seek to live sustainably, there is a need to inform consumers about the planetary impacts of their choices. We propose to construct a knowledge graph and application to create a water footprint calculator for our dietary choices that computes the water footprint for each ingredient of a meal proposed by the user and suggest alternatives to reduce the water footprint (e.g., replacing beef by pork in a cheesesteak meal will save the planet 1,600 gallons of water).&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Data Science Insight&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/shreya-padmanabhan">Shreya Padmanabhan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shreya-raj">Shreya Raj&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/maiah-pardo">Maiah Pardo&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/saurav-joshi">Saurav Joshi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yuhua-wu">Yuhua Wu&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/filip-ilievski">Filip Ilievski&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jay-pujara">Jay Pujara&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Knowledge Graphs&lt;/li>
&lt;li>Python&lt;/li>
&lt;li>Information Extraction&lt;/li>
&lt;li>Software Engineering&lt;/li>
&lt;li>Databases&lt;/li>
&lt;/ul></description></item><item><title>Listen to your body: Human Bio-signals as a Function of Indoor Air Quality Control for Human Health in Buildings</title><link>https://ckids-datafirst.github.io/website/projects/2022-fall/707/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-fall/707/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>The development of COVID-19 has had an impact on the lives of millions of people worldwide. Air quality needs to be improved immediately because COVID-19 can spread through airborne aerosols, raising concerns about how the virus will behave in enclosed spaces. To understand how to estimate the time lag of air quality transmission between indoor and outdoor, it is necessary to gather enough and reasonably accurate data before using a machine learning model and statistical tools to analyze the data. This will allow us to investigate the connection between indoor air quality and human physiological responses. Participants' bio-signals, survey results, and the state of the indoor and outdoor environments must all be collected as three different types of data. The heart rate, skin temperature, stress level, and EDA of the occupants make up the second set of bio-signal data.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yifan-wang">Yifan Wang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/vu-truong-si">Vu Truong Si&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/anood-alkatheeri">Anood Alkatheeri&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/pallavi-vijayan">Pallavi Vijayan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jae-park">Jae Park&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/conner-kojima">Conner Kojima&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yihao-zheng">Yihao Zheng&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nachiket-dunbray">Nachiket Dunbray&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/joon-ho-choi">Joon-Ho Choi&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>R&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;/ul></description></item><item><title>Machine Learning Enabled Fault Detection and Diagnosis of Quantum Circuits</title><link>https://ckids-datafirst.github.io/website/projects/2022-spring/601/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-spring/601/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This is an interdisciplinary data science project that involves aspects and requires expertise from quantum information theory and machine learning. In this project we plan to develop and implement a novel approach to substantially improving the performance of quantum computers using advancements in the area of machine learning enabled fault detection and diagnosis. We will adapt and further develop existing machine learning protocols to efficiently and reliably detect and diagnose faulty quantum circuits. The protocols are expected to reach beyond the capabilities of current arts in the error diagnosis of quantum circuits, and to provide detailed and transparent information about various sources of errors in the quantum circuits with significantly fewer queries to the quantum circuit and considerably fewer repeated experiments. This project will allow student to learn and acquire expertise in topics that cross quantum information theory, quantum computing, and machine learning.&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/amir-kalev">Amir Kalev&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;/ul></description></item><item><title>OSINT Social Networks on GitHub</title><link>https://ckids-datafirst.github.io/website/projects/2022-spring/607/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-spring/607/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Open-source intelligence (“OSINT”) is a rapidly growing area of cybersecurity. This project seeks to explore OSINT information available on GitHub. We’ll use the GitHub API and related tools to build networks to try to answer a number of interesting questions, such as “can you tell what software a company uses based on its employees networks?”, “do white hat hackers have social networks that look different than black hatters?” and others. If you’re interested in cybersecurity, OSINT, social networks, databases, APIs etc. then this is the project for you!&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/jeremy-abramson">Jeremy Abramson&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Databases&lt;/li>
&lt;li>GraphQL&lt;/li>
&lt;li>Neo4j&lt;/li>
&lt;/ul></description></item><item><title>Quantum Natural Language Processing for Fake News Identification</title><link>https://ckids-datafirst.github.io/website/projects/2022-fall/706/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-fall/706/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Advancements in artificial intelligence, especially neural networks, have enabled more intelligent models that can distinguish between fake and real information. However, these models suffer from over-fitting: a phenomenon where models memorize certain patterns in the dataset instead of understanding the actual underlying task.This prevents the models from generalizing well, especially across domains. Quantum Natural Language Processing (QNLP) is a very nascent field where quantum computers solve NLP problems. It has been shown that QNLP models have been able to solve many of the aforementioned tasks difficult for neural networks to solve. This is attributed to the fact that QNLP models naturally incorporate rich linguistic meanings and structure. In this project we will create neural network like models for QNLP. This will be done on fact verification datasets, with the goal of improving the quality of fake news identification.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Interdisciplinary Data Science Team&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/sofian-ghazali">Sofian Ghazali&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/pasha-biglarzadeh">Pasha Biglarzadeh&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/vignesh-selvaraj">Vignesh Selvaraj&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yang-cheng">Yang Cheng&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/chenghong-hu">Chenghong Hu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/anish-kushalapa">Anish Kushalapa&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jae-park">Jae Park&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/mitch-paul-mithun">Mitch Paul Mithun&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Deep Learning&lt;/li>
&lt;li>NLP&lt;/li>
&lt;/ul></description></item><item><title>Scientific Concept Discovery: Using Machine Learning to Advance Scientific Research</title><link>https://ckids-datafirst.github.io/website/projects/2022-spring/604/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-spring/604/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Our group focuses on the question of how to design a learning framework that promote the generalizability of machine learning models. In this project, you will focus on exploring how neural networks acquire information from the training examples and how they learn to solve various physical problems (e.g., emulation of simple quantum systems). The premise of this project is that by observing how a machine learning model learns to solve the specific task, we can learn about the underlying problem itself. As an example, by analyzing the weights of a trained neural network, you can discover non-trivial symmetries of the modeled physical system, determine the relative importance of features, or identify some non-trivial interplay between underlying physical mechanisms. Your task would be to learn various tools for interpreting deep neural networks. You will test them in practice and you will explore methods that promote model transparency and interpretability.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Project Achievement&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/marcin-abram">Marcin Abram&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>PyTorch&lt;/li>
&lt;li>Tensorflow&lt;/li>
&lt;li>Bash&lt;/li>
&lt;li>Quantum Mechanics&lt;/li>
&lt;/ul></description></item><item><title>Social media habits of misinformation spreaders</title><link>https://ckids-datafirst.github.io/website/projects/2022-fall/710/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-fall/710/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Social media habits represent one of the most common – and controversial – forms of habitual behavior in contemporary society. This project will investigate whether and how social media habits are linked to the spread of misinformation. Specifically, this research aims at understanding whether there are habits that can be identified within social media data that are unique to misinformation spreaders. For example: do these users re-post, reply to, or post content in ways that seem habitual — as opposed to behaviors based on the rewards received from other users (e.g., likes, re-posting)? To perform this analysis, students will closely examine social media data across multiple platforms. The goal of this project will be to develop a model which can infer habit-based vs. non-habitual processes from existing user data, and identify how these processes play a role in the spread of misinformation. This project will be in collaboration with the USC Department of Psychology.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Interdisciplinary Data Science Team&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Best Website&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/zhenmin-hua">Zhenmin Hua&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/siqi-liu">Siqi Liu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/xiayu-li">Xiayu Li&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/daniel-pereira-da-costa">Daniel Pereira Da Costa&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/richa-sheth">Richa Sheth&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/ian-anderson">Ian Anderson&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/luca-luceri">Luca Luceri&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Statistics&lt;/li>
&lt;li>Data Analysis&lt;/li>
&lt;li>Data Visualization&lt;/li>
&lt;/ul></description></item><item><title>Studying Scientific Innovation with Temporal Knowledge Graph Representation Learning</title><link>https://ckids-datafirst.github.io/website/projects/2022-spring/608/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-spring/608/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>What’s the next big idea, and who’s going to discover it? Our project is trying to understand how researchers make new discoveries and innovate new ideas. To do that we will apply deep learning techniques for temporal knowledge graph learning (RE-NET, CyGNet, HINGE, StarE) to a huge citation network dataset. We have assembled a KG with 260M research papers, 270M authors, 700K fields. To learn representations, our training tasks include citation prediction, author collaboration prediction, and field of study prediction.&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/kian-ahrabian">Kian Ahrabian&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/dong-ho-lee">Dong-Ho Lee&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jay-pujara">Jay Pujara&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Knowledge Graphs&lt;/li>
&lt;li>PyTorch&lt;/li>
&lt;li>Tensorflow&lt;/li>
&lt;li>Wikidata&lt;/li>
&lt;/ul></description></item><item><title>Turning READMEs into Chatbots</title><link>https://ckids-datafirst.github.io/website/projects/2022-fall/704/</link><pubDate>Sat, 01 Jan 2022 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2022-fall/704/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Ever get frustrated reading a README? Wish you could just ask someone for help instead of reading pages of documentation, combing through StackOverflow posts, and consulting lecture slides? This ambitious project will build a team of students to convert documentation in README files, ReadTheDocs and other manual pages, and StackOverflow posts into short conversations. These conversations will be used to train a dialogue model like DialoGPT to help create an assistive chat bot that can answer questions about code.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Data Science Teamwork&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/shuijing-mejia">Shuijing Mejia&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ronak-shah">Ronak Shah&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/balaji-chidambaram">Balaji Chidambaram&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/aaron-cheng">Aaron Cheng&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/aamir-miyajiwala">Aamir Miyajiwala&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/hieu-nguyen">Hieu Nguyen&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/jay-pujara">Jay Pujara&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>PyTorch&lt;/li>
&lt;li>Huggingface&lt;/li>
&lt;/ul></description></item><item><title>Automatically segmenting and describing the human corpus callosum from brain MRIs</title><link>https://ckids-datafirst.github.io/website/projects/2021-fall/106/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-fall/106/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>The human corpus callosum is the largest pathway connecting the left and right hemispheres of the brain. The shape of the corpus callosum (CC) changes throughout the course of human development, and it can also be altered with respect to disease onset. We can explore the variation in CC shape along the middle of the brain, but we need to extract it reliably first. The lab currently has two methods for extracting the CC, one using only image processing techniques, and another using deep learning (UNet) but these methods do not always extract the CC accurately. The accuracy results often depend on the MRI scanner that was used, or the abnormalities present in the scan. Can we improve the performance of our deep learning model with additional training data? Can we change some processing steps to improve the model? Once we do have an accurate segmentation, then what shape metrics of the CC as a whole, or in parts, are most telling of the underlying biology, such as age and risk for disease?&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Interdisciplinary Data Science Project&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Best Project Achievement&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Best Project Website&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/kathy-wang">Kathy Wang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shayan-javid">Shayan Javid&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/abhinaav-ramesh">Abhinaav Ramesh&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/vineet-agarwal">Vineet Agarwal&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jiahui-lu">Jiahui Lu&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/neda-jahanshad">Neda Jahanshad&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Deep Learning&lt;/li>
&lt;li>Bash&lt;/li>
&lt;li>R&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://urldefense.com/v3/__https://corpcalusc.github.io/__;!!LIr3w8kk_Xxm!9pX3r3ReCCQAD3tQ9C0aL8SCckHbz0ScVKdFZ8-MN25gZaG-tePfe9YZLLZy2yg$" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Comparing Clinical Trials to Improve Cancer Treatments</title><link>https://ckids-datafirst.github.io/website/projects/2021-fall/108/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-fall/108/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>The goal of this project is to assist clinicians to find the best course of treatment for a cancer patient based on the latest and most appropriate clinical trials. Because new drugs are appearing increasingly fast, it is hard to keep track of the outcomes of all clinical trials and determine the best treatment. In collaboration with biomedical researchers, we have been developing algorithms to extract information about clinical trials from government websites, to structure the information, and to find the clinical trials that are most relevant for a given patient. We want to improve the algorithms to structure this information, and to develop similarity metrics that will help us retrieve and rank clinical trials.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Interdisciplinary Data Science Project&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/lili-zhou">Lili Zhou&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nikita-goel">Nikita Goel&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/chanda">Chanda&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sanjeev-kadagathur-vadiraj">Sanjeev Kadagathur Vadiraj&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/audrey-lin">Audrey Lin&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/wenjia-dou">Wenjia Dou&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/yolanda-gil">Yolanda Gil&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;/ul></description></item><item><title>COVID-19 misinformation</title><link>https://ckids-datafirst.github.io/website/projects/2021-fall/107/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-fall/107/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This new project attempts to understand the interaction between anti-vaxxers (anti-vaccination groups) and alt-right groups on platforms such as Facebook. The goal of this project is to understand how do these two types of fringe groups interact over the years, and how do their interactions and discourse evolve during the COVID-19 pandemic. It would be interesting to explore the longitudinal patterns of network/discourse co-evolution and how such patterns may change in times of dramatic events.In terms of data, I have access to Facebook’s historical data archive and I have collected a dataset that contains a list of anti-vaxxer (n=158) and alt-right groups’ (n=183) Facebook posts over 10 years (2010-2021). The dataset can be further expanded with additional help.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Highlighted Project&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yilin-qi">Yilin Qi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/revanth-madamala">Revanth Madamala&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/luer-lyu">Luer Lyu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/chidambaram-veerappan">Chidambaram Veerappan&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/aimei-yang">Aimei Yang&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Social Network Analysis&lt;/li>
&lt;li>NLP&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://drive.google.com/file/d/1by9TUZ-oXRFRnou8ahqtYkTu3ygpBk7U/view?usp=sharing" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Decoding How Humans Encode Memories</title><link>https://ckids-datafirst.github.io/website/projects/2021-fall/101/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-fall/101/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Advancements in closed-loop deep brain stimulation (DBS) enabled more intelligent autonomy for therapeutic intervention across a wide range of neurologic and psychiatric disorders. The predominant approach relies on control-theoretic approximations of the brain’s complex functional relationships with the external environment–in particular, a mapping between targeted stimulation and naturalistic responses of different regions of the brain. However, existing approaches fail to capture the environmental context of neuronal biomarkers. Thus, we leverage a set of IoT sensors to capture the human experience and environmental context, i.e., a subset of human sensory channels, in order to estimate the state of the human brain and provide the foundation for smarter, context-dependent DBS. We explore neural-symbolic approaches that integrate the powerful perception capabilities of deep learning with human logic to reason about the complex dependencies across a heterogeneous set of sensors.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Cyberphysical Data Science Project&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yishan-li">Yishan Li&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/navyada-koshatwar">Navyada Koshatwar&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/gayathri-shrikanth">Gayathri Shrikanth&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/rushi-shah">Rushi Shah&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/pranjali-tushar-tembhurnikar">Pranjali Tushar Tembhurnikar&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/manuel-amaya">Manuel Amaya&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/luis-garcia">Luis Garcia&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Deep Learning&lt;/li>
&lt;/ul></description></item><item><title>Detecting Biases in College Football Recruiting</title><link>https://ckids-datafirst.github.io/website/projects/2021-fall/105/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-fall/105/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>College football recruiting is big business. This project aims to build and analyze a comprehensive college football recruiting dataset, to help determine if there are biases in who and how college football coaches recruit players. This data set will combine college football recruiting data from the web with census and other socioeconomic data, to search for patterns in where and how college football coaches recruit players.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Highlighted Project&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/hassaan-hasan">Hassaan Hasan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/aditya-dave">Aditya Dave&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yin-he">Yin He&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kartik-balodi">Kartik Balodi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/akshat-jetli">Akshat Jetli&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/manav-jain">Manav Jain&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/jeremy-abramson">Jeremy Abramson&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Web Scraping&lt;/li>
&lt;li>SQL&lt;/li>
&lt;li>NoSQL&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://drive.google.com/file/d/10oMWG2cyIlynJpOiyYkuaDTv_pCCleJI/view?usp=sharing" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Detecting Biases in College Football Recruiting (Spring - 2021)</title><link>https://ckids-datafirst.github.io/website/projects/2021-spring/12/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-spring/12/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>College football recruiting is big business. This project aims to build and analyze a comprehensive college football recruiting dataset, to help determine if there are biases in who and how college football coaches recruit players. This data set will combine college football recruiting data from the web with census and other socioeconomic data, to search for patterns in where and how college football coaches recruit players.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/manasi-godse">Manasi Godse&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jackie-fan">Jackie Fan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yash-gupta">Yash Gupta&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/rehan-ahmed">Rehan Ahmed&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jiahang-song">Jiahang Song&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/jeremy-abramson">Jeremy Abramson&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>APIs&lt;/li>
&lt;li>Databases&lt;/li>
&lt;li>Web Scraping&lt;/li>
&lt;li>SQL&lt;/li>
&lt;li>NoSQL&lt;/li>
&lt;/ul></description></item><item><title>Discovering and Measuring Biases in Commonsense Knowledge Bases</title><link>https://ckids-datafirst.github.io/website/projects/2021-fall/103/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-fall/103/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Common sense knowledge bases are used widely in research, spanning many areas in artificial intelligence, including natural language understanding, computer vision, and planning. However, these resources may contain human biases, which will ultimately be embedded in the resulting AI solution and potentially have negative societal impacts. The extent to which these biases exist is unclear. In this project, you will define several well-motivated biases (location, gender, ethnicity) and measure the extent to which they are represented in ConceptNet.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Data Science Insight&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/linglan-zhang">Linglan Zhang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yu-zhang">Yu Zhang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sara-melotte">Sara Melotte&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/aditya-uday-malte">Aditya Uday Malte&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/namita-santosh-mutha">Namita Santosh Mutha&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/fred-morstatter">Fred Morstatter&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/filip-ilievski">Filip Ilievski&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;li>Clustering&lt;/li>
&lt;li>Language Models&lt;/li>
&lt;li>Data Analysis&lt;/li>
&lt;/ul></description></item><item><title>Drought prediction in Southern California using deep learning</title><link>https://ckids-datafirst.github.io/website/projects/2021-spring/5/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-spring/5/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Seasonal drought predictions are important for the management of water resources for agriculture, urban consumption… Seasonal forecasts have traditionally been done using a physics-based model. In this project, we will use a deep learning approach for drought forecasting in CA.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/shubhashree-dash">Shubhashree Dash&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/katie-chak">Katie Chak&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/deborah-khider">Deborah Khider&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>PyTorch&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;li>Deep Learning&lt;/li>
&lt;/ul></description></item><item><title>Impacts of Smart Windows on Human’s Bio-Signals</title><link>https://ckids-datafirst.github.io/website/projects/2021-fall/112/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-fall/112/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This research will be conducted to find out the relationship between humans’ bio-signals and electrochromic windows, which is useful to create a possible mechanism of using bio-signals to control the windows. By using wearable sensors and remote sensors, subjects’ bio-signals like heart rate, skin temperature, and pupil sizes, and indoor environmental quality like temperature and humidity could be monitored and analyzed. At last, by utilizing machine learning and data analysis skills, the impacts on humans’ bio-signals could be analyzed.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/wenjia-dou">Wenjia Dou&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jingping-yu">Jingping Yu&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/zihan-wang">Zihan Wang&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Machine Learning&lt;/li>
&lt;li>Data Analysis&lt;/li>
&lt;/ul></description></item><item><title>Investigate the healthy indoor air quality under Covid-19 in Los Angeles based on machine learning</title><link>https://ckids-datafirst.github.io/website/projects/2021-fall/111/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-fall/111/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>So far we do not know how much ventilation quantity will be needed to effectively prevent infection with COVID-19, and there is no sensor can measure coronavirus. But PM 2.5 and CO2 are the good indicators to estimate the Covid-19 virus concentration. Moreover, bio-signals can be used to assess people’s state of health. In my experiment, I will find participants and collect data including indoor environmental data, outdoor environmental data and human bio-signals. Then, the data will be analyzed by machine learning to find the correlation between the indoor environmental factors and the outdoor environmental factors, the correlation between the indoor environmental factors and human factors, also I can find the appropriate range of every indoor air quality factors when people under human healthy state, therefore, finally I can control the window to keep the indoor CO2 and pm2.5 within that range of the conclusion to keep people in a healthy state.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yixiao-li">Yixiao Li&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/zhaohong-feng">Zhaohong Feng&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/minghuan-gong">Minghuan Gong&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;/ul></description></item><item><title>Investigating disparities in the COVID-19 epidemic in Los Angeles County through fine-grained epidemic modeling</title><link>https://ckids-datafirst.github.io/website/projects/2021-fall/114/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-fall/114/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Fine-grained epidemiological modeling of the spread of COVID-19 can inform public health policy that accounts for disparities in the risk of exposure, infection, and death across different locations and different demographic groups. In Los Angeles County, disparities in COVID-19 infection rates by neighborhood have been tremendous. Throughout the current large outbreak wave, infection incidence rates in low-income, predominantly Hispanic neighborhoods of East LA have consistently been 10-15 times higher than in wealthier, predominantly white neighborhoods in West LA. Many well-informed hypotheses exist to explain the cause of these disparities in infection, including employment sectors that require leaving homes to work, household density, and behavioral differences across cultures and age groups. But for Los Angeles County, these hypotheses have not been evaluated quantitatively in the context of an epidemic modeling framework.&lt;/p>
&lt;p>To explain the disproportionate impact of the virus on disadvantaged demographic groups in Los Angeles County, we are developing a networked multiple-population epidemic model to investigate how epidemic dynamics and infection outcomes differ across fine-grained neighborhoods. Specifically, we will extend an already-developed stochastic SEIR+ disease model that includes healthcare, death, and vaccination compartments into the networked multiple-population framework, which will model movements, contacts, and infection pathways within and between neighborhoods. A key feature of this modeling framework will be the use of dynamic mobility data, derived from US cell phone data, to inform changes in the daily movements of people within and between neighborhoods. This data will provide the basis of a weighted infection-transmissible contact network between neighborhoods. The SEIR disease model is run on top of this contact network, determining infection dynamics across the neighborhoods. The model will allow obtaining estimates of key epidemic quantities including transmission rates (and the time-varying reproductive number, R(t)) and infection fatality rates for each neighborhood, and identifying the neighborhoods driving epidemic spread (through contacts within and across neighborhoods). Furthermore, hierarchical modeling techniques will be used to obtain estimates of infection and fatality rates for substrata representing combinations of ethnicity/race, age, and sex within each neighborhood.&lt;/p>
&lt;p>CKIDS PROJECT TASKS&lt;/p>
&lt;p>While the overarching goal of this project is to develop a multiple-population epidemic model for Los Angeles County (LAC) across a network of connected neighborhoods, it is also necessary to maintain a single-population model for LAC as a whole that estimates the epidemic parameters for this larger spatial level. Such a single-population model has been maintained since May 2020 by the USC Biostatistics COVID modeling team. This model serves two important purposes. First, since May 2020 it has supported the LAC Department of Public Health, which has requested updates on key epidemic predictions on a weekly basis. Second; the parameters estimated from the single population model will serve as prior distributions in the Bayesian parameter estimation framework used in the networked-neighborhood model.&lt;/p>
&lt;p>The first task for the CKIDS student will be to re-implement the parameter estimation framework for the existing LAC-level model, such that parameters are estimated each week and fixed for future estimates forward in time. This can be done either through modification to the existing code and parameter estimation framework, written in R and using Approximate Bayesian Computation (ABC), or through a full reimplementation of the modeling code. The second task will be to maintain the model estimation and website displaying updates through weekly updates using data that comes directly from the LAC Department of Public Health. A third possible task, depending on the interest of the CKIDS student, will be to apply the modeling to California data, and other counties in California (so far it has only been applied to LAC data).&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/tao-huang">Tao Huang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jianing-julia-chen">Jianing (Julia) Chen&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/abigail-horn">Abigail Horn&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Computational Simulation&lt;/li>
&lt;li>R&lt;/li>
&lt;/ul></description></item><item><title>Investigating disparities in the COVID-19 epidemic in Los Angeles County through fine-grained epidemic modeling (Spring - 2021)</title><link>https://ckids-datafirst.github.io/website/projects/2021-spring/2/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-spring/2/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Fine-grained epidemiological modeling of the spread of COVID-19 can inform public health policy that accounts for disparities in the risk of exposure, infection, and death across different locations and different demographic groups. In Los Angeles County, disparities in COVID-19 infection rates by neighborhood have been tremendous. Throughout the current large outbreak wave, infection incidence rates in low-income, predominantly Hispanic neighborhoods of East LA have consistently been 10-15 times higher than in wealthier, predominantly white neighborhoods in West LA. Many well-informed hypotheses exist to explain the cause of these disparities in infection, including employment sectors that require leaving homes to work, household density, and behavioral differences across cultures and age groups. But for Los Angeles County, these hypotheses have not been evaluated quantitatively in the context of an epidemic modeling framework.&lt;/p>
&lt;p>To explain the disproportionate impact of the virus on disadvantaged demographic groups in Los Angeles County, we are developing a networked multiple-population epidemic model to investigate how epidemic dynamics and infection outcomes differ across fine-grained neighborhoods. Specifically, we will extend an already-developed stochastic SEIR+ disease model that includes healthcare, death, and vaccination compartments into the networked multiple-population framework, which will model movements, contacts, and infection pathways within and between neighborhoods. A key feature of this modeling framework will be the use of dynamic mobility data, derived from US cell phone data, to inform changes in the daily movements of people within and between neighborhoods. This data will provide the basis of a weighted infection-transmissible contact network between neighborhoods. The SEIR disease model is run on top of this contact network, determining infection dynamics across the neighborhoods. The model will allow obtaining estimates of key epidemic quantities including transmission rates (and the time-varying reproductive number, R(t)) and infection fatality rates for each neighborhood, and identifying the neighborhoods driving epidemic spread (through contacts within and across neighborhoods). Furthermore, hierarchical modeling techniques will be used to obtain estimates of infection and fatality rates for substrata representing combinations of ethnicity/race, age, and sex within each neighborhood.&lt;/p>
&lt;p>CKIDS PROJECT TASKS&lt;/p>
&lt;p>While the overarching goal of this project is to develop a multiple-population epidemic model for Los Angeles County (LAC) across a network of connected neighborhoods, it is also necessary to maintain a single-population model for LAC as a whole that estimates the epidemic parameters for this larger spatial level. Such a single-population model has been maintained since May 2020 by the USC Biostatistics COVID modeling team. This model serves two important purposes. First, since May 2020 it has supported the LAC Department of Public Health, which has requested updates on key epidemic predictions on a weekly basis. Second; the parameters estimated from the single population model will serve as prior distributions in the Bayesian parameter estimation framework used in the networked-neighborhood model.&lt;/p>
&lt;p>The first task for the CKIDS student will be to re-implement the parameter estimation framework for the existing LAC-level model, such that parameters are estimated each week and fixed for future estimates forward in time. This can be done either through modification to the existing code and parameter estimation framework, written in R and using Approximate Bayesian Computation (ABC), or through a full reimplementation of the modeling code. The second task will be to maintain the model estimation and website displaying updates through weekly updates using data that comes directly from the LAC Department of Public Health. A third possible task, depending on the interest of the CKIDS student, will be to apply the modeling to California data, and other counties in California (so far it has only been applied to LAC data).&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/tao-huang">Tao Huang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jianing-julia-chen">Jianing (Julia) Chen&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/abigail-horn">Abigail Horn&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>R&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;li>Computational Simulation&lt;/li>
&lt;/ul></description></item><item><title>Looking at White Hat (?) Hacker Social Networks on Github</title><link>https://ckids-datafirst.github.io/website/projects/2021-fall/104/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-fall/104/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>“Open-source intelligence (“OSINT”) is a rapidly growing area of cybersecurity. This project seeks to explore OSINT information available on GitHub. Specifically, we will build and analyze a dataset comprised of users on GitHub who show a specific interest in GitHub repos related to hacking artifacts. This dataset and social network analysis could help us determine what attributes lead to “black hat” — or malicious — cyber actors.”&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/jonathan-lal">Jonathan Lal&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/aditya-ramani">Aditya Ramani&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sanket-bhilare">Sanket Bhilare&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/keerthana-prakash">Keerthana Prakash&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/himani-amrute">Himani Amrute&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/jeremy-abramson">Jeremy Abramson&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Databases&lt;/li>
&lt;li>APIs&lt;/li>
&lt;li>GraphQL&lt;/li>
&lt;li>OSINT&lt;/li>
&lt;li>Cybersecurity&lt;/li>
&lt;/ul></description></item><item><title>Looking at White Hat (?) Hacker Social Networks on Github (Spring - 2021)</title><link>https://ckids-datafirst.github.io/website/projects/2021-spring/9/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-spring/9/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Open-source intelligence (“OSINT”) is a rapidly growing area of cybersecurity. This project seeks to explore OSINT information available on GitHub. Specifically, we will build and analyze a dataset comprised of users on GitHub who show a specific interest in GitHub repos related to hacking artifacts. This dataset and social network analysis could help us determine what attributes lead to “black hat” — or malicious — cyber actors.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Project Achievement&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/dat-nguyen">Dat Nguyen&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/wenwen-zheng">Wenwen Zheng&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nai-cih-liou">Nai Cih Liou&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/xinran-liang">Xinran Liang&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/jeremy-abramson">Jeremy Abramson&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Databases&lt;/li>
&lt;li>APIs&lt;/li>
&lt;li>GraphQL&lt;/li>
&lt;/ul></description></item><item><title>Machine Learning to Analyze Rock Microstructures</title><link>https://ckids-datafirst.github.io/website/projects/2021-spring/11/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-spring/11/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Students will analyze images from optical microscopes that reveal features of materials and microstructures using machine learning techniques. These images have been collected by geologists, who use them to study the rock samples that they collect in the field and determine their properties and origins. We have a baseline system already implemented, and the goal is to improve it with new machine learning techniques, guided by the insights of our collaborating geologists.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Data Science Teamwork&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/xiaoyu-wang">Xiaoyu Wang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/stephen-iota">Stephen Iota&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/bolong-pan">Bolong Pan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/junyi-liu">Junyi Liu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ming-lyu">Ming Lyu&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yolanda-gil">Yolanda Gil&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/wael-abd-almageed">Wael Abd-Almageed&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;li>Computer Vision&lt;/li>
&lt;/ul></description></item><item><title>Mapping the Ethical Concerns Surrounding AI Research</title><link>https://ckids-datafirst.github.io/website/projects/2021-spring/3/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-spring/3/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>With the recent enthusiasm about algorithmic fairness and responsible AI, many conferences are encouraging or requiring a broader impact section to assess societal harms and benefits of the AI research being presented. In this project, we will analyze the themes of these sections, with a particular focus on the ethical issues being addressed and acknowledged. We will develop tools and methods to evaluate the harms and benefits of the presented research. The goal is to see how is the community helping AI research to be less harmful but more beneficial for society. For more background on work in this area, please review this workshop.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Data Science Collaboration Practices&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/chaitali-joshi">Chaitali Joshi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/param-bole">Param Bole&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/muhammad-oneeb-ul-haq-khan">Muhammad Oneeb Ul Haq Khan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/madeleine-thompson">Madeleine Thompson&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/aparna-nair">Aparna Nair&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/fred-morstatter">Fred Morstatter&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>AI ethics&lt;/li>
&lt;/ul></description></item><item><title>Mapping the Uncanny Valley</title><link>https://ckids-datafirst.github.io/website/projects/2021-spring/10/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-spring/10/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>While many stories involve the friendly and familiar, scary stories across cultures, from Hamlet to Yotsuya Kaidan to Siren Head involve beings that are almost—but not quite—human. Can these stories give us insight into the “nearly-human” uncanny valley? Initial results from our group say yes! While some research has explored the uncanny valley for images, the research is limited and previously unexplored in text format. If we can extract human emotions surrounding text descriptions, we can exploit an enormous array of data. Our goal this semester is to analyze our objective definitions of “fear” or “creepiness” in a story and test how the similarity of words to “human” make them more or less creepy. Moreover, we will explore what features of objects make them more or less scary. These findings share a direct relationship to AI and robotics where our goal is always to improve pleasant interactions and affability in human-computer and human-robot interactions. The students will build on initial work to apply NLP methods to these texts and improve upon existing initial results.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/saurabh-jain">Saurabh Jain&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/athashree-vartak">Athashree Vartak&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/olivia-fryt">Olivia Fryt&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yilin-qi">Yilin Qi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jiashu-xu">Jiashu Xu&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/keith-burghardt">Keith Burghardt&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>NLTK&lt;/li>
&lt;li>NLP&lt;/li>
&lt;li>Keras&lt;/li>
&lt;/ul></description></item><item><title>Microtelcos and the Digital Divide in CA</title><link>https://ckids-datafirst.github.io/website/projects/2021-spring/1/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-spring/1/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>The COVID-19 pandemic has reinvigorated calls to close the digital divide in the US and elsewhere. Without adequate Internet access, households are at a disadvantage in education, jobs, health, and other key dimensions of wellbeing. While local broadband markets are increasingly concentrated, there is also increased interest in exploring the role those small local operators (“microtelcos”) can play in serving in low-income and rural communities. These microtelcos range from small wireless cooperatives to mom-and-pop private ISPs to municipal-backed operators. This project seeks to a) map and identify the characteristics of communities where microtelcos are present in CA, and b) to examine whether microtelcos presence affects broadband service quality and adoption by businesses and households in the community. The project will use broadband deployment data collected by the CPUC (California Public Utilities Commission) and socioeconomic data from the Census Bureau.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Data Science Poster Award&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/asjad-asif-jah">Asjad Asif Jah&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jonathan-gonzalez">Jonathan Gonzalez&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/hernan-galperin">Hernan Galperin&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Statistics&lt;/li>
&lt;li>GIS&lt;/li>
&lt;li>Econometrics&lt;/li>
&lt;/ul></description></item><item><title>NVISION: Network Visualization Interventions Supporting Interpretation of Objective News</title><link>https://ckids-datafirst.github.io/website/projects/2021-fall/102/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-fall/102/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>“The current fractured media landscape allows individuals to choose confirming over credible information, and information spreads quicker online than interventions like fact-checking. Misinformation can be debiased by identifying gaps in mental representations of the world (mental models) and prompt alerts to be vigilant about assessing information (Lewandowsky et al, 2012).&lt;/p>
&lt;p>We aim to develop interventions to make media-balance salient to users to mitigate the spread of misinformation. Social sampling theory describes that our misperceptions of others is explained by the sample of people we encounter (Galesic, Olsson, and Rieskamp, 2012), and we are more likely to link to similar people online (Kossinets &amp;amp; Watts, 2009). Our interventions address limitations in individual views of the media landscape. We aim to attach visualizations of sharing network to news articles in real-time to make these biases explicit.”&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Highlighted Project&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yanan-zhou">Yanan Zhou&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/megan-josep">Megan Josep&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/priya-mane">Priya Mane&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/minrui-chen">Minrui Chen&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/abhinav-rao">Abhinav Rao&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/samip-kalyani">Samip Kalyani&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/daniel-benjamin">Daniel Benjamin&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/fred-morstatter">Fred Morstatter&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Data Analysis&lt;/li>
&lt;li>Data Collection&lt;/li>
&lt;li>Data Visualization&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://drive.google.com/file/d/12Zk_1YMWYH9NtknIqg8NSEX0W3XXtKK7/view?usp=sharing" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Object detection and classification APIs for urban street image analysis</title><link>https://ckids-datafirst.github.io/website/projects/2021-fall/110/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-fall/110/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Developing models to analyze images is a demanding task that requires significant time, resources, and effort. Recently, companies such as Amazon and Google are providing services to make the modeling process easier so even users with little machine learning expertise can enjoy deep learning technologies. Based on our prior work in object detection and classification for smart city applications, we would like to compare and evaluate the process and performance of commercial services using our training datasets. This project will be a good practice to understand the image machine learning modeling process and the advantages/limitations of commercial services for customized learning.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Data Science Presentation&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/vibhav-chitalia">Vibhav Chitalia&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/utkarsh-baranwal">Utkarsh Baranwal&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/carlos-zamora">Carlos Zamora&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/wonjun-lee">Wonjun Lee&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/seon-ho-kim">Seon Ho Kim&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://urldefense.com/v3/__https://sites.google.com/usc.edu/homeless-encampments-in-la/home__;!!LIr3w8kk_Xxm!8Eq9mrdqLFc0LUaCGdonNK44IuCugGgQSvt90cBL0xEveRFYamBdezONTYoHeoo$" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Object Detection and Classification for Street Cleanliness</title><link>https://ckids-datafirst.github.io/website/projects/2021-spring/4/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-spring/4/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>In collaboration with the Sanitation Department of LA, IMSC has been developing a framework to automatically detect the cleanliness of streets as well as any special objects in need of removal. The framework makes use of machine learning technology trained on images/videos collected by the city and/or taken by citizens. The images taken by mobile cameras (e.g., LA City’s garbage collection trucks and/or citizens’ smartphones using our own MediaQ App) are transferred to the MediaQ server, then these images can be automatically classified based on predefined cleanliness indexes and object types (such as bulky item, illegal dumping). In this project, we will focus on the detection and classification of homeless encampments in LA streets. Recorded images/videos with GPS location data will be processed and the classification results will be displayed on a map to understand the distribution of homeless people in LA, which is essential data to study the homeless issue.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/trisha-sinha">Trisha Sinha&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/harsh-jaykumar-jalan">Harsh Jaykumar Jalan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ashwin-sakhare">Ashwin Sakhare&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/divya-manjunath">Divya Manjunath&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nai-cih-liou">Nai Cih Liou&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/seon-ho-kim">Seon Ho Kim&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;li>Data Visualization&lt;/li>
&lt;li>Computer Vision&lt;/li>
&lt;/ul></description></item><item><title>Omics and Aging in Killfish</title><link>https://ckids-datafirst.github.io/website/projects/2021-spring/6/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-spring/6/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Students will analyze aging in the African turquoise killifish, a species with the shortest lifespan of all vertebrates. By analyzing multi-omic data over the lifetime of many individuals, we can begin to understand the cellular changes that reflect aging.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/deven-panchac">Deven Panchac&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/michael-mathew">Michael Mathew&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/akansha-das">Akansha Das&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/suchetha-bhat">Suchetha Bhat&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shruti-krishna-kumar">Shruti Krishna Kumar&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/jose-luis-ambite">Jose-Luis Ambite&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/berenice-benayoun">Berenice Benayoun&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>R&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;/ul></description></item><item><title>Studying the Effects of Genes and Environment in Aging</title><link>https://ckids-datafirst.github.io/website/projects/2021-spring/7/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-spring/7/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Students will analyze genomic and environmental data collected through the lifetime of individuals to investigate which genes and external conditions could be associated with aging. The goal of the project will be to reproduce an existing published paper and improve on its results.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Interdisciplinary Data Science Team&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/meera-patel">Meera Patel&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/boya-li">Boya Li&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/haoyang-chen">Haoyang Chen&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/qinming-zhang">Qinming Zhang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ming-yan">Ming Yan&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/t-em-arpawong">T. Em Arpawong&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yolanda-gil">Yolanda Gil&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>R&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;/ul></description></item><item><title>Tracking health and nutrition signals from social media data (begun Spring 2020)</title><link>https://ckids-datafirst.github.io/website/projects/2021-fall/113/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-fall/113/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Food environments (the physical spaces where people acquire and consume food) can profoundly impact diet and related diseases. Effective, robust measures of food environment nutritional quality are required by researchers and policymakers investigating their effects on individual dietary behavior and designing targeted public health interventions. The most commonly used indicators of food environment nutritional quality are limited to measuring the binary presence or absence of entire categories of food outlet type, such as ‘fast-food’ outlets, which can range from burger joints to salad chains. There would be great value in a summarizing indicator of restaurant nutritional quality that exists along a continuum, and which can be applied at the scale of large food environments, for example across Los Angeles County, to make distinctions between diverse restaurants within and across categories of food outlets.&lt;/p>
&lt;p>This project will explore the ability to track real-life health and nutrition signals from social media data, focusing on data from Foursquare and Yelp. We will investigate the ability to access menu information from the APIs of these social media platforms, and develop measures to assess the nutritional content of these menus. Multiple aims will be investigated in this project, including scraping data from social media; NLP of menu text, tag, and comment data; developing predictive models of obesity; and more. “Ground truth” data on dietary patterns of LA residents will be available, enabling validation of dietary measures and predictive models built from menu data.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/iris-c-liu">Iris C. Liu&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/andres-abeliuk">Andrés Abeliuk&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/abigail-horn">Abigail Horn&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>R&lt;/li>
&lt;li>NLP&lt;/li>
&lt;li>Statistical Modeling&lt;/li>
&lt;/ul></description></item><item><title>Transfer learning for adversarial machine translation</title><link>https://ckids-datafirst.github.io/website/projects/2021-fall/109/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-fall/109/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Neural Machine Translation (NMT) is the process of mapping a segment of words from a source language to a target language using neural networks. However, NMT systems rely on large datasets for the source and target languages, and perform poorly on low-resource languages where there is insufficient parallel data. An effective method for improving NMT on low-resource languages is to employ transfer learning, where a model trained on a high-resource language pair is used to initialize training for the low-resource language pair. In this work, we will study the effect of employing transfer learning methods on an adversarial machine translation models based on Long Short-Term Memory Recurrent Neural Networks (LSTM).&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/manoj-yadav">Manoj Yadav&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/frederick-norman">Frederick Norman&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/vikrame-vasudev-krishnan">Vikrame Vasudev Krishnan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/hanyu-he">Hanyu He&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shaochen-tan">Shaochen Tan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/amit-singh">Amit Singh&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/mohammad-reza-rajati">Mohammad Reza Rajati&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>PyTorch&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;li>Deep Learning&lt;/li>
&lt;/ul></description></item><item><title>Turning Cyber Data into Language</title><link>https://ckids-datafirst.github.io/website/projects/2021-spring/8/</link><pubDate>Fri, 01 Jan 2021 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2021-spring/8/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Cyber ontologies such as STIX and ATT&amp;amp;CK can represent complex relationships between cyber threat actors, attacks, and infrastructure. While such representations are easily processed by computers, cyber analysts often prefer dealing with written text. Natural language ontologies like FrameNet represent language in a structured manner as well, but frame specifications are often not specific enough for a given domain (like cybersecurity). In this project, students will learn about cybersecurity threat ontologies and build a GUI web app tool that annotates provided cyber threat documents. No previous knowledge of cybersecurity necessary!&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/ruoyu-li">Ruoyu Li&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/carol-varkey">Carol Varkey&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/rengapriya-aravindan">Rengapriya Aravindan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/chuqi-liu">Chuqi Liu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ziheng-gong">Ziheng Gong&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/jeremy-abramson">Jeremy Abramson&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Flask&lt;/li>
&lt;li>Streamlit&lt;/li>
&lt;/ul></description></item><item><title>A Data Challenge for Parkinson’s Disease</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/301/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/301/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project will assemble a team to participate in the Biomarker and Endpoint Assessment to Track Parkinson’s Disease (BEAT-PD) DREAM Challenge by the Michael J. Fox Foundation and Sage Bionetworks. The challenge is designed to benchmark new methods to predict Parkinson’s disease progression. Teams participating in the Challenge will have access to raw sensor data that can be used to predict individual medication state and symptom severity. Specifically, teams are asked to develop methods to predict on/off medication status, dyskinesia severity, and/or tremor severity.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Highlighted Project&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/nazgol-tavabi">Nazgol Tavabi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/che-pai-kung">Che-Pai Kung&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/abhivineet-veeraghanta">Abhivineet Veeraghanta&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/likitha-lakshminarayanan">Likitha Lakshminarayanan&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/neda-jahanshad">Neda Jahanshad&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Data Science&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://docs.google.com/presentation/d/1pY10sNfzXdAmuiSFJomMNlRpiiXH3vzH/edit?usp=sharing&amp;amp;amp;ouid=116088473370484068569&amp;amp;amp;rtpof=true&amp;amp;amp;sd=true" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>A framework for enabling software comparison and classification</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/206/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/206/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>The number of scientific products, including scientific software, has been steadily growing in the last years. This growth makes it difficult for researchers to understand all the latest code and publications available. A great body of research has attempted at classifying similar papers and literature. However, there aren’t to date good approaches for finding similar or related code. In this project, the students will analyze different unsupervised methods to find scientific software similarities based on a) An automated analysis of their dependencies; b) By classifying the main functionality of software components.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yi-xie">Yi Xie&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/bin-zhang">Bin Zhang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/mohan-krishna-thota">Mohan Krishna Thota&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/param-bole">Param Bole&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/daniel-garijo">Daniel Garijo&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;li>Sklearn&lt;/li>
&lt;li>Data Manipulation&lt;/li>
&lt;/ul></description></item><item><title>A Knowledge Graph for Cybersecurity Experiments</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/304/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/304/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>The DETER cybersecurity testbed has been running experiments for several years, collecting information about intrusions, vulnerabilities, and mitigation strategies. This project will capture cybersecurity experiments as a knowledge graph that can be browsed, queried, and mined to find patterns and create models of cyberattacks.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Highlighted Project&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/alex-zihuan-ran">Alex Zihuan Ran&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/hardik-mahipal-surana">Hardik Mahipal Surana&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/daniel-garijo">Daniel Garijo&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jelena-mirkovic">Jelena Mirkovic&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Knowledge Graphs&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://drive.google.com/file/d/1AquQEf1qDw8EtFfwPSEOSv4py6moYzTs/view?usp=sharing" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Annotating Paleoclimate Data</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/314/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/314/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Paleoclimate data is highly diverse, requiring different sets of metadata to describe the various datasets. In this project, you will help build an interface to assist researchers in annotating their paleoclimate datasets according to an evolving reporting standard (PaCTS) and download them in the Linked Paleo Data (LiPD) format. The interface should be highly interactive (wizard) to accommodate the diversity of the data as well as offer editing capabilities for existing datasets (upload LiPD files) and check their compliance with PaCTS, plot location information and the time series, download into the LiPD format, upload to a semantic wiki and/or an SQL database. In addition, the interface should support the use of a recommender system (to be build) to help researchers in annotating their datasets.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Highlighted Project&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yincheng-lin">Yincheng Lin&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shravya-manety">Shravya Manety&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/julien-emile-geay">Julien Emile-Geay&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/deborah-khider">Deborah Khider&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Javascript&lt;/li>
&lt;li>Web Technologies&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://docs.google.com/presentation/d/1HJX5TzBcSZ4E4QFkcazJy-hy78wNa6lY/edit?usp=sharing&amp;amp;amp;ouid=116088473370484068569&amp;amp;amp;rtpof=true&amp;amp;amp;sd=true" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Brain morphometry from contrast-enhanced T1-weighted brain MRIs</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/208/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/208/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Cancer remains the second leading cause of death in the US. However, recent advancements have increased cancer survivorship, now numbering tens of millions. Given this, there is tremendous interest in studying cancer-related cognitive impairment (CRCI) and CRCI due to chemotherapy or “chemobrain”, can afflict up to 78% of survivors. The neural substrates of CRCI are unknown and understanding this may improve survivors’ quality of life. The CRCI neuroimaging literature is still in its infancy and these studies have used small sample sizes from traditional research-dedicated nCE scans. Because conducting well-powered neuroimaging studies is very expensive, adapting clinical CE T1w scans could prove useful for CRCI and many other diseases like dementia. The promary objective of this project is to develop a novel deep learning method to generate nCE images from acquired CE T1w scans to allow accurate brain morphometry and be a plentiful source of inexpensive neuroimaging data.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/ming-lyu">Ming Lyu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/danielle-sim">Danielle Sim&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/devansh-shah">Devansh Shah&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ting-fung-lam">Ting Fung Lam&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/mark-shiroishi">Mark Shiroishi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/neda-jahanshad">Neda Jahanshad&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/paul-thompson">Paul Thompson&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>AI&lt;/li>
&lt;li>Familiarity with MRIs&lt;/li>
&lt;/ul></description></item><item><title>Capturing the Provenance of Data Analysis Using the PROV Standard</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/319/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/319/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Documenting how a result was obtained from data analysis involves documenting the software, software settings, and datasets used to obtain that result so it can be explained properly. The current ASSET interface enables users to document the provenance of data analysis no matter what infrastructure they used (R scripts, sk-learn, etc). This project will focus on capturing provenance records for data science projects and using the W3C PROV standard to export those records. It will also develop tools to mine provenance data to find common patterns of use.&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/yolanda-gil">Yolanda Gil&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Knowledge Graphs&lt;/li>
&lt;li>Javascript&lt;/li>
&lt;li>Firebase&lt;/li>
&lt;li>UI Development&lt;/li>
&lt;/ul></description></item><item><title>Characterizing the counter-narratives of climate change</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/216/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/216/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Top climate scientists post their findings and views regularly on social media. These very scientists are met with tweets from those with opposing views, often containing vitriolic and false information. It is important that we can identify and characterize these tweets to understand the counter-narratives of climate change. We will address topics including false information, bot campaigns, and harassment.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Data Collection&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/abhilash-pandurangan">Abhilash Pandurangan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/aditya-jajodia">Aditya Jajodia&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sushmitha-ravikumar">Sushmitha Ravikumar&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/vanshika-sridharan">Vanshika Sridharan&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/fred-morstatter">Fred Morstatter&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/deborah-khider">Deborah Khider&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Classification&lt;/li>
&lt;li>Data Collection&lt;/li>
&lt;/ul></description></item><item><title>Characterizing the counter-narratives of climate change (Spring - 2020)</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/316/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/316/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Top climate scientists post their findings and views regularly on social media. These very scientists are met with tweets from those with opposing views, often containing vitriolic and false information. It is important that we can identify and characterize these tweets to understand the counter-narratives of climate change. We will address topics including false information, bot compaigns, and harassment.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Data Science Teamwork&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/abhilash-pandurangan">Abhilash Pandurangan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/aditya-jajodia">Aditya Jajodia&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sushmitha-ravikumar">Sushmitha Ravikumar&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/vanshika-sridharan">Vanshika Sridharan&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/fred-morstatter">Fred Morstatter&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/deborah-khider">Deborah Khider&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Classification&lt;/li>
&lt;li>Data Collection&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://docs.google.com/presentation/d/1c9WlvN8qHV6xKS7SRY1FNWZ7AHxNW3cf/edit?usp=sharing&amp;amp;amp;ouid=116088473370484068569&amp;amp;amp;rtpof=true&amp;amp;amp;sd=true" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Connections within Contemporary Feminism Movements</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/305/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/305/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project will look at events data collected at several recent feminist social movements and understand their connection to each other. Specifically, it will explore individuals or organizations that played instrumental role in movement mobilization, relationship brokerage between movements, and building and sustaining activist communities. Previous research suggests that those distinctive movements are often not isolated incidents, but mobilized by a core group of “leaders” or similar ideas and frames. The goal is to understand how seemingly disconnected movements relate to one another help to reveal the lasting impact of mediated movements.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Interdisciplinary Data Science Team&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/keerti-bhogaraju">Keerti Bhogaraju&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ian-myoungsu-choi">Ian Myoungsu Choi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/negar-mokhberian">Negar Mokhberian&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nazanin-alipourfard">Nazanin Alipourfard&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/aimei-yang">Aimei Yang&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Social Network Analysis&lt;/li>
&lt;li>Data Mining&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://drive.google.com/file/d/1ocNzQLKJNMOLdlvl4wRBD20cMav60T6d/view?usp=sharing" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Data scraping for salary benchmarking</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/310/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/310/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project will develop a data scraper to collect salary records from a website that provides compensation data for faculty at public universities. When provided a list of faculty names and institutional affiliations, this program will search for the associated records, extract the relevant results, and copy the data into a spreadsheet. The purpose of this project is to explore the feasibility of automating an otherwise time-consuming data collection task required for benchmarking of faculty salaries in relation to peer institutions. This will ultimately facilitate a number of important tasks, including analysis of potential salary disparities within certain disciplines and faculty tracks.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Highlighted Project&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/matthew-lim">Matthew Lim&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/minh-nguyen">Minh Nguyen&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sindhu-ravi">Sindhu Ravi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/vihang-mangalvedhekar">Vihang Mangalvedhekar&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kevin-tsang">Kevin Tsang&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/fred-morstatter">Fred Morstatter&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ginger-clark">Ginger Clark&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/tj-mccarthy">T.J. McCarthy&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Web Scraping&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://drive.google.com/file/d/18DZTF3f9I70khOtIEiSUrjCCqrONuhAo/view?usp=sharing" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Detecting Biases in College Football Recruiting (Fall - 2020)</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/213/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/213/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>College football recruiting is big business. This project aims to determine if there are biases in who and how college football coaches recruit players. By creating a comprehensive data set of college recruits and integrating relevant data with current socioeconomic markers (i.e. census data) we hope to determine if there are patterns in who and where football coaches recruit their players, regardless of talent.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/saurabh-jain">Saurabh Jain&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jackie-fan">Jackie Fan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/akansha-das">Akansha Das&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/jeremy-abramson">Jeremy Abramson&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Data Science&lt;/li>
&lt;li>Social Science&lt;/li>
&lt;li>Mapmaking&lt;/li>
&lt;li>Economics&lt;/li>
&lt;/ul></description></item><item><title>Digital Democracy: Using Social Media to Improve Political Discourse</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/215/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/215/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Politicians in modern democracies across the world have eagerly adopted social media for engaging their constituents, entering into direct dialogs with citizens. From the perspective of political actors, there is a need to continuously gather, monitor, analyze, and visualize politically relevant information from online social media with the goal to improve communication with citizens and voters. The goal of this proposal is to create a tool that enhances interaction and dialogue between political actors and their followers. This will be achieved by creating compact and comprehensive summaries that aggregate and visualize common narratives, thus, reducing the cognitive load required to read all the messages and streamlining the dialogue experience.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/alex-spangher">Alex Spangher&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yash-shah">Yash Shah&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/swetha-thomas">Swetha Thomas&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/hongyu-li">Hongyu Li&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/raveena-kshatriya">Raveena Kshatriya&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/abhi-thadeshwar">Abhi Thadeshwar&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/andres-abeliuk">Andrés Abeliuk&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>NLP&lt;/li>
&lt;li>Programming&lt;/li>
&lt;/ul></description></item><item><title>Digital Democracy: Using Social Media to Improve Political Discourse (Spring - 2020)</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/317/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/317/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Politicians in modern democracies across the world have eagerly adopted social media for engaging their constituents, entering into direct dialogs with citizens. From the perspective of political actors, there is a need to continuously gather, monitor, analyze, and visualize politically relevant information from online social media with the goal to improve communication with citizens and voters. The goal of this proposal is to create a tool that enhances interaction and dialogue between political actors and their followers. This will be achieved by creating compact and comprehensive summaries that aggregate and visualizes common narratives, thus, reducing the cognitive load required to read all the messages and streamlining the dialogue experience.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Data Science Collaboration Practices&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/alex-spangher">Alex Spangher&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yash-shah">Yash Shah&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/swetha-thomas">Swetha Thomas&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/hongyu-li">Hongyu Li&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/raveena-kshatriya">Raveena Kshatriya&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/abhi-thadeshwar">Abhi Thadeshwar&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/andres-abeliuk">Andrés Abeliuk&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>NLP&lt;/li>
&lt;li>Programming&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://docs.google.com/presentation/d/1j8e6L6RTTJdzabFxpg6CUSoYAJZJT35Z/edit?usp=sharing&amp;amp;amp;ouid=116088473370484068569&amp;amp;amp;rtpof=true&amp;amp;amp;sd=true" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Disparities in educational achievement</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/312/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/312/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>The project will combine socio-economic data from US Census with college and K-12 performance data to identify correlates of positive educational outcomes. Of specific interest will be assessing how economic inequalities and racial disparities affect educational achievement in different regions of US.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Highlighted Project&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yuzi-he">Yuzi He&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ziping-hu">Ziping Hu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/zicai-wang">Zicai Wang&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/kristina-lerman">Kristina Lerman&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://docs.google.com/presentation/d/1fHg8Yvi6SPm1LDqRAl9rCAVH2KGFmhSz/edit?usp=sharing&amp;amp;amp;ouid=116088473370484068569&amp;amp;amp;rtpof=true&amp;amp;amp;sd=true" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Gender inclusion in science</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/313/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/313/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project will measure representation of women in various scientific disciplines across time (and different countries) and identify institutions who have succeeded in creating a more welcoming environment for women. While there are already studies that use bibliographic data to map career trajectories of women, they do not focus on the role that institutions (and countries) – and their policies – play in retaining female researchers.&lt;/p>
&lt;p>The data comes from Microscoft Academic Graph, containing millions of papers from institutions around the world across many decades. We will use Ethenea API to extract gender (and ethnicity) of authors.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Highlighted Project&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/ninareh-mehrabi">Ninareh Mehrabi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/aditya-gupta">Aditya Gupta&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/vineetha-nadimpalli">Vineetha Nadimpalli&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ayushi-jha">Ayushi Jha&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/goran-muric">Goran Muric&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kristina-lerman">Kristina Lerman&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://docs.google.com/presentation/d/16kHZphM8T7ACRlco0Ozg-1ADslPaP5Nm/edit?usp=sharing&amp;amp;amp;ouid=116088473370484068569&amp;amp;amp;rtpof=true&amp;amp;amp;sd=true" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Generation of a Sports-based Introductory Data Science Curriculum to Increase Participation of Underrepresented Groups in STEM</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/211/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/211/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>As the requirements for success in the workforce become increasingly technical, there is a commensurate need for curricula that can engage and capture the imagination of students, especially those from traditionally underrepresented groups in STEM. One way to reach these groups is via curricula that appeals to contexts in which they’re familiar and engaged, such as sports. To that end, this project will explore the development of a sports-based introductory data science curriculum with the goal of engaging students who might otherwise not be interested in pursuing data science as a career. Students will work on generation of illustrative code examples/problem sets in Python using sports examples.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/everest-law">Everest Law&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/zhongying-wang">Zhongying Wang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sushmitha-ravikumar">Sushmitha Ravikumar&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/jeremy-abramson">Jeremy Abramson&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Pedagogy&lt;/li>
&lt;li>Social Science&lt;/li>
&lt;/ul></description></item><item><title>Integration of Frame Semantics to Cyber Ontologies</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/207/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/207/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Cyber ontologies such as STIX and ATT&amp;amp;CK can represent complex relationships between cyber threat actors, attacks and infrastructure. While such representations are conducive to interoperability between systems, they are often unwieldy for human cyber analysts to deal with directly. Conversely, Natural language generation (NLG) frameworks like FrameNet represent language in a structured manner, but frame specifications are often not specific enough for specialized domains (such as cyber security). Leveraging and combining the semantic structure of both forms can create a tool that can translate cyber threat data in standard interoperable formats (such as STIX) to human-readable reports, via existing NLG frameworks. Working on a project such as this provides an opportunity for significant impact, as the fusion of these two structures could greatly increase both the adoption and the utility of cyber threat ontologies.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/folk-narongrit">Folk Narongrit&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/carol-varkey">Carol Varkey&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/francis-sun">Francis Sun&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/jeremy-abramson">Jeremy Abramson&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>NLP&lt;/li>
&lt;li>Data Science&lt;/li>
&lt;li>OSINT&lt;/li>
&lt;li>Cybersecurity&lt;/li>
&lt;/ul></description></item><item><title>Investigations of a Data Science Online Community</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/202/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/202/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>The Kaggle.com competition ecosystem is a rich and active community with a designed Progression System that uses performance medals to rank and differentiate users into tiers. However, winning performance medals in Kaggle is more complex than it appears. Users are bound by the available competitions, characteristics of the competition’s problem statement, the quality of their software submissions, and the quality of other competitors (including collaborators). With these factors, one user’s earned “Gold” medal from one competition may have required more effort and a higher quality solution than another user’s earned “Gold” medal in a different competition. This project has great potential to learn about open competitions in data science. Some example questions are: What features help predict whether a user will win a medal in a competition? How can users be clustered and differentiated from one another using their competition patterns and medal-winning solutions? How quickly (in days) will a user win their next competition medal? What is the probability that a user will assemble a team for a competition? What are features that predict high-performing teams? What features help generate teammate recommendations?&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Interdisciplinary Data Science Team&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/sara-melotte">Sara Melotte&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/devendra-swami">Devendra Swami&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jacob-bickman">Jacob Bickman&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jae-young-kim">Jae Young Kim&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kevin-tsang">Kevin Tsang&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/marlon-twyman">Marlon Twyman&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>R&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;li>R&lt;/li>
&lt;li>Matlab&lt;/li>
&lt;/ul></description></item><item><title>Machine Learning to Analyze Rock Microstructures (Fall - 2020)</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/210/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/210/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Students will analyze images from optical microscopes that reveal features of materials and microstructures using machine learning techniques. These images have been collected by geologists, who use them to study the rock samples that they collect in the field and determine their properties and origins. We have a baseline system already implemented, and the goal is to improve it with new machine learning techniques, guided by the insights of our collaborating geologists.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Project Presentation&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/abhivineet-veeraghanta">Abhivineet Veeraghanta&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/bolong-pan">Bolong Pan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/bryan-beh">Bryan Beh&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/hanzhi-zhang">Hanzhi Zhang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/feilong-wu">Feilong Wu&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/yolanda-gil">Yolanda Gil&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;li>Image Analysis&lt;/li>
&lt;/ul></description></item><item><title>Mapping the impacts of climate change across LA County</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/205/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/205/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>We are increasingly hyper-aware of the effects of climate change. But we are not always aware of how hyper-local those effects can be. Across the 2,500 square miles of Los Angeles County, the impact of climate change is playing out in different ways. For example, some areas are experiencing more frequent spikes in extreme temperatures, while others are not. We propose a project that would fuse together several different datasets in order to map how temperature changes and other variables are hitting some corners of Los Angeles harder than others. Often, these areas are inhabited by people facing numerous other inequities, such as poor healthcare access. By examining several years’ worth of hourly average temperatures from thousands of spots across Los Angeles County, and combining that with other datasets, such as tree cover, cases of asthma, and so forth, it is possible to create an interactive map that illustrates where the impacts of climate change are most acute. This project would be published by Annenberg’s Crosstown publishing outlet and would be distributed widely. The project would have immediate practical applications and could inform policy decisions on issues such as where to place parks and green spaces.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Interdisciplinary Data Science Team&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/simon-khan">Simon Khan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/vinith-angadi">Vinith Angadi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yingxi-lin">Yingxi Lin&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shalini-mustala">Shalini Mustala&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/gabriel-kahn">Gabriel Kahn&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>GIS&lt;/li>
&lt;li>R&lt;/li>
&lt;li>SQL&lt;/li>
&lt;/ul></description></item><item><title>Mapping the Uncanny Valley (Fall - 2020)</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/204/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/204/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>While many stories involve the friendly and familiar, scary stories across cultures, from Hamlet to Yotsuya Kaidan to Siren Head involve beings that are almost—but not quite—human. Can these stories give us insight into the “nearly-human” uncanny valley? And are the most popular stories at the nadir of this valley? In this project we aim to explore the uncanny valley through analysis of several thousand stories posted on Reddit posted over a decade. These data contain stories that cross a range of topics, and include user comments and story scores. We will explore the prevalence of the monsters over time, and explore whether there is some optimal characteristics of these monsters that make them so scary. While some research has explored the uncanny valley for images, the research is limited and virtually unexplored in text format. The students will build on initial work by the advisor to apply NLP methods to these texts and improve upon existing initial results.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Cyberphysical Data Science Team&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yuchen-zhang">Yuchen Zhang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nai-cih-liou">Nai-Cih Liou&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/haripriya-dharmala">Haripriya Dharmala&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sakshi-goel">Sakshi Goel&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/raveena-kshatriya">Raveena Kshatriya&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/keith-burghardt">Keith Burghardt&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>NLTK&lt;/li>
&lt;li>NLP&lt;/li>
&lt;li>Keras&lt;/li>
&lt;li>Gensim&lt;/li>
&lt;/ul></description></item><item><title>Microtelcos and the Digital Divide in CA (Spring - 2020)</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/311/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/311/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>The broadband access market is increasingly dominated by a few large ISPs. However, small and medium-size operators (“microtelcos”) are critical to connectivity in low-income and rural communities in CA, serving markets of little interest to large operators. The primary goal of this project is to combine broadband infrastructure deployment data from the CPUC (California Public Utilities Commission) and socioeconomic data from the Census Bureau to understand the characteristics of the communities served by microtelcos, and to analyze whether the presence of a microtelco operator contributes to higher levels of connectivity in the community. The main technical challenge is to combine spatial data found in CPUC files with census block level data provided by the Census Bureau. This is part of an ongoing research program called Connected Communities and Inclusive Growth (CCIG).&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/hernan-galperin">Hernan Galperin&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>GIS&lt;/li>
&lt;li>R&lt;/li>
&lt;li>Stata&lt;/li>
&lt;/ul></description></item><item><title>Modeling Uncertainty in Drought Products</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/315/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/315/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Droughts can have a substantial impact on agricultural systems and human livelihood. A Python package to calculate various drought indices in being developed. In this project, you will expand on this package and develop methods to test the sensitivity of the models to various input datasets and parameters. In addition, you will develop post-processing code to determine the return period of the drought (is it a 1 in 20 yr event or 1 in 5 yr event?).&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/deborah-khider">Deborah Khider&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;li>Probability Theory&lt;/li>
&lt;/ul></description></item><item><title>Modelling Spatiotemporal Relationships between Waste Water Injection and Induced Seismicity</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/309/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/309/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Induced seismicity refers to earthquakes that are caused as a result of human activity, such as disposing of wastewater by injecting it into the subsurface. This project will focus on spatiotemporal statistics to model space-time relationships between injected wastewater and induced earthquakes. The model will incorporate space-time data pertaining to seismic activity and associated human systems to create forecasts of induced earthquakes.&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/orhun-aydin">Orhun Aydin&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>R&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;li>GIS&lt;/li>
&lt;li>Data Analysis&lt;/li>
&lt;li>Programming&lt;/li>
&lt;/ul></description></item><item><title>Predicting Effective Tax Rate of Publicly-Traded Firms</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/307/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/307/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>The purpose of this project is to analyze business firms’ text disclosures to determine if those text disclosures are related to firms’ tax rates. In so doing we first capture information about the text and then relate that text information to quantitative information, using statistical modeling. So far, we have generated and used some bags of words to capture information that we expect will provide insight into the tax rates that those firms incur. Our knowledge acquisition approach, to gather those bags of words, was to interview an expert. We then counted the number of occurrences of those words in our text, and used statistical models to relate the number of those occurrences to different measures of tax rates. We find that those bags of words are statistically significantly related to measures of tax rates that firms pay. In addition, we find that “tax specific bags of words” work “better” than “generic accounting bags of words.”&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Highlighted Project&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/jae-young-kim">Jae Young Kim&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yuqin-jiang">Yuqin Jiang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kanlin-cheng">Kanlin Cheng&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/saravanan-manoharan">Saravanan Manoharan&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/daniel-oleary">Daniel O'Leary&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Statistical Analysis&lt;/li>
&lt;li>Text Analysis&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://docs.google.com/presentation/d/1y0FlBNOC2PxX3OPa45-d_Orr7kPnvaG3/edit?usp=sharing&amp;amp;amp;ouid=116088473370484068569&amp;amp;amp;rtpof=true&amp;amp;amp;sd=true" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Social Graph Analysis and Attribution of Software Exploit Contributors Using GitHub</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/209/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/209/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Attribution of cyber threat actors is an increasingly important and difficult problem. One potential mitigation is the early detection of potential threat actors via analysis of open-source intelligence (OSINT). This project will analyze the social graph of users who contribute to, follow, star, and otherwise interact with proof-of-concept CVE implementations and other relevant potentially malicious (e.g. software vulnerability) repositories on GitHub.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/atharva-rishi">Atharva Rishi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/erin-szeto">Erin Szeto&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jiaxin-liang">Jiaxin Liang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kshitij-gupta">Kshitij Gupta&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nghi-le">Nghi Le&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/erica-xia">Erica Xia&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/jeremy-abramson">Jeremy Abramson&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Data Science&lt;/li>
&lt;li>Cybersecurity&lt;/li>
&lt;li>Graph Analysis&lt;/li>
&lt;/ul></description></item><item><title>Team Dynamics in Online Multiplayer Games</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/217/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/217/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Competitive online multiplayer team games such as CounterStrike, PUBG, or League of Legends are extremely popular. Multiple teams of professional players compete in hundreds of tournaments yearly. Player transfer between teams is common. The goal of the project is to measure the effects of player transfers and to answer some of the questions such as: How does a new player affect the team’s performance?; How does the change of a team affect a player’s performance? The world of online games can be used as a fruitful area for tackling more fundamental questions on human society and collaboration dynamics in different settings.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/kevin-tsang">Kevin Tsang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jiaqi-liu">Jiaqi Liu&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/goran-muric">Goran Muric&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;/ul></description></item><item><title>Team Dynamics in Online Multiplayer Games (Spring - 2020)</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/320/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/320/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Competitive online multiplayer team games such as CounterStrike, PUBG or League of Legends are extremely popular. Multiple teams of professional players compete in hundreds of tournaments yearly. Player transfer between teams are common. The goal of the project is to measure the effects of player transfers and to answer some of the questions such as: How does a new player affect the team’s performance?; How does the change of a team affects player’s performance? The world of online games can be used as a fruitful area for tackling more fundamental questions on human society and collaboration dynamics in different settings.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Data Collection&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/kevin-tsang">Kevin Tsang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jiaqi-liu">Jiaqi Liu&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/goran-muric">Goran Muric&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://docs.google.com/presentation/d/1VqiPWIAd8En19Yh1PZwBY_3uhtIDOEex/edit?usp=sharing&amp;amp;amp;ouid=116088473370484068569&amp;amp;amp;rtpof=true&amp;amp;amp;sd=true" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Text Analysis, Social Networks and Crowdsourcing</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/308/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/308/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>The purpose of this project is to analyze a crowdsourcing setting for both the sentiment and other categories of meaning in the text, and the roles and impact of a network of contributors on the votes and potentially on the content.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Highlighted Project&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/dan-peng">Dan Peng&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/gitanjali-kanakaraj">Gitanjali Kanakaraj&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/hanieh-arabzadehghahyazi">Hanieh Arabzadehghahyazi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/naiya-shah">Naiya Shah&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nana-andriana">Nana Andriana&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/keith-burghardt">Keith Burghardt&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/daniel-oleary">Daniel O'Leary&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Social Network Analysis&lt;/li>
&lt;li>Text Analysis&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://docs.google.com/presentation/d/1FChD-jxZFHIQ6kSndkobTgoxSEyh6GnC/edit?usp=sharing&amp;amp;amp;ouid=116088473370484068569&amp;amp;amp;rtpof=true&amp;amp;amp;sd=true" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>The aging individual brain</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/203/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/203/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Deep learning models are now able to predict how an individual’s face will age in a very realistic manner — ensuring key individual features, for example eye color, are maintained, while more age related features, such as the texture of skin, are altered to be representative of a desired age group. However, little work has been done to try and predict how an individual’s brain will age. Such models may be able to help predict early signatures of neurodegenerative disorders. The goal of this project will be to test several models to realistically predict how a given individual’s brain will look at any age in mid to late adulthood. Either or both deep learning based and image processing based methods would be encouraged. Students will work with a dataset of over 20,000 brain scans of individuals aged 45-80, approximately 1000 of whom have a scan again after two years. This is currently an active project in the lab and students will join researchers already working on this problem to further explore and improve methodology.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/aleck-cervantes">Aleck Cervantes&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/david-lin">David Lin&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shunlin-lu">Shunlin Lu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/tianyu-zhu">Tianyu Zhu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yichao-zhu">Yichao Zhu&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/neda-jahanshad">Neda Jahanshad&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;li>Deep Learning&lt;/li>
&lt;li>Git&lt;/li>
&lt;/ul></description></item><item><title>Towards Automated Understanding of Scientific Software</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/303/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/303/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Data science projects require knowledge of software that changes rapidly. As a result, scientists spend hours reading long documentations and manuals instead of advancing their scientific fields. In this project, we aim to automatically extract relevant aspects of scientific software (e.g., what does it do, how to install it, how to operate with it or how to cite it) from documentation and code using machine learning techniques. The students will build on an existing baseline of classifiers and try to improve the existing results.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Data Science Open and Sharing Practices&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Best Project Presentation&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/haripriya-dharmala">Haripriya Dharmala&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jiaying-wang">Jiaying Wang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/vedant-diwanji">Vedant Diwanji&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/daniel-garijo">Daniel Garijo&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Knowledge Graphs&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;li>Sklearn&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://docs.google.com/presentation/d/1PPJju0B0EbdnZcLuSrZjDE4SHT_8mZHd/edit?usp=sharing&amp;amp;amp;ouid=116088473370484068569&amp;amp;amp;rtpof=true&amp;amp;amp;sd=true" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Tracking health and nutrition signals from social media data</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/214/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/214/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project will explore the ability to track real-life health and nutrition signals from social media data, focusing on data from Instagram and Foursquare. We will investigate the quality of Instagram posts as a source of data for measurements of dietary patterns and nutrition quality, focusing on spatial, textual, and (&lt;em>new in this semester&lt;/em>) image content of posts linked to food outlets in Los Angeles, as well as nutritional content analysis of menus available online. Multiple aims will be investigated in this project, including: scraping data from social media; NLP of tag, comments, and menu data; image analysis; predictive models and social network analysis; and more. Also new in this semester: “ground truth” data on dietary patterns of LA residents will be available, enabling validation of dietary measures and predictive models built from Instagram posts.&lt;/p>
&lt;p>The project will build on the DataFest 2019 project, and will expand the scope to actually access up-to-date data from Instagram, in particular: data with images, the underlying social connections / social network, and of course more timely (which requires data scraping).&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/abhilash-karpurapu">Abhilash Karpurapu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/erica-xia">Erica Xia&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/iris-liu">Iris Liu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/spoorti-nidagundi">Spoorti Nidagundi&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/andres-abeliuk">Andrés Abeliuk&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/abigail-horn">Abigail Horn&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kayla-de-la-haye">Kayla de la Haye&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yelena-mejova">Yelena Mejova&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;li>NLP&lt;/li>
&lt;li>Statistical Analysis&lt;/li>
&lt;li>Social Network Analysis&lt;/li>
&lt;li>Sentiment Analysis&lt;/li>
&lt;li>R&lt;/li>
&lt;li>Image Analysis&lt;/li>
&lt;/ul></description></item><item><title>Tracking health and nutrition signals from social media data (Spring - 2020)</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/306/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/306/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project will explore the ability to track real-life health and nutrition signals from social media data, focusing on data from Instagram and Foursquare. We will investigate the quality of Instagram posts as a source of data for measurements of dietary patterns and nutrition quality, focusing on spatial, textual, and (&lt;em>new in this semester&lt;/em>) image content of posts linked to food outlets in Los Angeles, as well as nutritional content analysis of menus available online. Multiple aims will be investigated in this project, including: scraping data from social media; NLP of tag, comments, and menu data; image analysis; predictive models and social network analysis; and more. Also new in this semester: “ground truth” data on dietary patterns of LA residents will be available, enabling validation of dietary measures and predictive models built from Instagram posts.&lt;/p>
&lt;p>The project will build on the DataFest 2019 project, and will expand the scope to actually access up-to-date data from Instagram, in particular: data with images, the underlying social connections / social network, and of course more timely (which requires data scraping).&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Project Achievement&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/abhilash-karpurapu">Abhilash Karpurapu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/erica-xia">Erica Xia&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/iris-liu">Iris Liu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/spoorti-nidagundi">Spoorti Nidagundi&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/andres-abeliuk">Andrés Abeliuk&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/abigail-horn">Abigail Horn&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kayla-de-la-haye">Kayla de la Haye&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yelena-mejova">Yelena Mejova&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>R&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;li>NLP&lt;/li>
&lt;li>Statistical Analysis&lt;/li>
&lt;li>Social Network Analysis&lt;/li>
&lt;li>Sentiment Analysis&lt;/li>
&lt;li>Image Analysis&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://docs.google.com/presentation/d/1JL_7V4M3pyrxM48YQt3m-E-MgTgvE4s4/edit?usp=sharing&amp;amp;amp;ouid=116088473370484068569&amp;amp;amp;rtpof=true&amp;amp;amp;sd=true" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Turning Library Collections into Data Science Challenges and Resources</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/318/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/318/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Libraries, museums, and archives hold unique collections that may be very useful for data science. These collections include photographs, videos, letters, and other artifacts that could give unique insights when analyzed. In this project, students will work with the USC Libraries to identify existing collections that would be potentially interesting as targets for data science, describe those collections in collaboration with the USC Libraries so they can be promoted as data science resources, and create APIs and other access mechanisms for data science researchers on campus and beyond.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Interdisciplinary Data Science Team&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/chaitra-mudradi">Chaitra Mudradi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/feilong-wu">Feilong Wu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/hsing-hsien-wang">Hsing-Hsien Wang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/maria-macharrie">Maria MacHarrie&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shubhankar-singh">Shubhankar Singh&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yangtao-hu">Yangtao Hu&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/deborah-holmes-wong">Deborah Holmes-Wong&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yolanda-gil">Yolanda Gil&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Data Science&lt;/li>
&lt;/ul></description></item><item><title>User-centered building design preference assessment to develop data-driven interactive architectural design guideline models</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/212/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/212/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>In many architectural designing scenarios, architects and clients inevitably spend a lot of time determining design agreements due to a lack of understanding about the client’s design needs and preferences. An architectural design process could be significantly expedited and simplified if modeling software can accurately extract the user’s preferred design features and integrate them into the design process. In this project, we addressed the challenges of demonstrating a stochastic model with the consideration of the user’s physiological responses and subjective design perceptions by using data analytic methods. This technical principle exploited personal design preferences that would adopt them to the design process to effectively complete an architecture project.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/muhammad-oneeb-ul-haq-khan">Muhammad Oneeb Ul Haq Khan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/adwaita-jadhav">Adwaita Jadhav&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/rosy-zhou">Rosy Zhou&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yuka-kaku">Yuka Kaku&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/joon-ho-choi">Joon-Ho Choi&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;li>Sklearn&lt;/li>
&lt;li>WEKA&lt;/li>
&lt;li>R&lt;/li>
&lt;/ul></description></item><item><title>Using Biomedical Researcher Judgments to Predict Clinical Trial Outcomes</title><link>https://ckids-datafirst.github.io/website/projects/2020-spring/302/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-spring/302/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Human patients should only be assigned to experimental medical treatments when investigators are truly uncertain about the novel treatment’s clinical utility. As such, the outcomes of clinical trials are difficult to predict by design. The goal of this project is to work toward building a predictive model of clinical trials. The first step is to categorize treatments based on their history and diseases based on their treatability using FDA records among other data sources. In collaboration with the Biomedical Ethics Unit at McGill University, we have collected many probability predictions about scientific and operational outcomes of newly registered clinical trials. When pre-processing is completed, we will begin building a model to predict the judgments of medical experts based on several trial and researcher characteristics. This model can be used to assess whether medical researchers are biased in their judgments about their own trials. Finally, we aim to assemble these components to develop a model to predict the outcomes of the clinical trials by accounting for the history of the treatment, treatability of the disease, and judgments of medical research accounting for revealed biases.&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/daniel-benjamin">Daniel Benjamin&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Classification&lt;/li>
&lt;li>Predictive modeling&lt;/li>
&lt;li>Data Collection&lt;/li>
&lt;/ul></description></item><item><title>Worldwide Survey Estimates of Maternal Bereavement</title><link>https://ckids-datafirst.github.io/website/projects/2020-fall/201/</link><pubDate>Wed, 01 Jan 2020 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2020-fall/201/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Infant and child mortality rates have been steadily declining worldwide over the last fifty years. Without reservation, these trends represent good news for children and for their parents, but the link between child mortality and parents’ experiences, however, remains loosely defined. Documenting global inequality in maternal bereavement offers a window into how health disparities directly affect the lives and well-being of mothers. In this project, we will offer the first, global analysis of the prevalence of bereaved mothers by leveraging data collected between 2010 and 2018 from 168 countries. I request student support to expand current survey coverage. Student(s) will work to identify and access public-use, nationally-representative reproductive history survey data for select European, Asian, and Latin American countries to supplement current data coverage, and to offer direct estimations to compare to indirectly derived ones based on current fertility and mortality levels. Students will work to adapt code used for other surveys to generate descriptive statistics of the prevalence of ever bereaved mothers in each country. Students will also work to improve and supplement the illustration of key study findings.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Best Data Science Teamwork&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Highlighted Project&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/aditi-singh">Aditi Singh&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/madeleine-thompson">Madeleine Thompson&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/malika-seth">Malika Seth&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/asumi-suguro">Asumi Suguro&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/emily-smith-greenaway">Emily Smith-Greenaway&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Stata&lt;/li>
&lt;li>Statistics&lt;/li>
&lt;/ul></description></item><item><title>A visual analytic toolkit for cultural biases</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/409/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/409/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project will result in a visual analytics toolkit that will enable social scientists to understand the cultural groups and biases at play in a social dataset. News, books, and social media all contain biases that stem from the cultural background of the author(s). We have developed algorithms to identify the cultural groups at play in an arbitrary dataset, as well as natural language processing approaches that can discover the biases of each group. This project would help bring put these tools into the hands of social scientists by displaying the output of these algorithms in novel visualizations.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/mansi-ganatra">Mansi Ganatra&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/saatvik-tikoo">Saatvik Tikoo&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/fred-morstatter">Fred Morstatter&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Javascript&lt;/li>
&lt;/ul></description></item><item><title>A workshop Tutorial on R and R Studio for Environmental Sciences Curriculum</title><link>https://ckids-datafirst.github.io/website/projects/2019-spring/508/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-spring/508/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project created an electronic notebook using R and R studio for environmental sciences curriculum. The notebook will be used for undergraduate students to teach advanced statistical analysis about population health in Fall 2019.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/huy-nghiem">Huy Nghiem&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/deborah-khider">Deborah Khider&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jill-sohm">Jill Sohm&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul></description></item><item><title>Analyzing Paleoclimate Data</title><link>https://ckids-datafirst.github.io/website/projects/2019-spring/509/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-spring/509/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project focused on a causality analysis of paleoclimate time series using Pyleoclim, which is a Python package geared towards the analysis and visualization of paleoclimate data. Future work includes exploring and testing additional algorithms for time series analysis.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/han-wu">Han Wu&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/deborah-khider">Deborah Khider&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Automated generation of paper authors</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/408/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/408/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project will result in an open-source software tool that will have general applicability for scientific publications. Papers with hundreds of authors are not uncommon in science, and it often takes many weeks to compile an author list in the desired order with proper affiliations and acknowledgments. We have implemented an algorithm that generates the author information for a paper based on the type of contribution of each author within the ENIGMA neuroscience consortium. This project would extend this software to read in compiled spreadsheets or forms and extract information about universities and other institutions from structured web sources, to interoperate with widely-used frameworks such as Wikidata.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Data Science Open and Sharing Practices&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/ruiyu-zhao">Ruiyu Zhao&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/xingyu-wei">Xingyu Wei&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/tieming-sun">Tieming Sun&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/neda-jahanshad">Neda Jahanshad&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>RDF&lt;/li>
&lt;li>UI Development&lt;/li>
&lt;/ul></description></item><item><title>Automated time series analysis</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/411/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/411/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project will result in a Python package for automated time series analysis. Based on the characteristics of the data, you will design functions that (1) perform essential tasks in data cleaning and select appropriate methodologies, (2) implement various algorithms currently not supported through pandas and scikit-learn, and (3) create appropriate visualizations.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/feng-zhu">Feng Zhu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/myron-kwan">Myron Kwan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shilpa-thomas">Shilpa Thomas&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nikhil-dhara-venkata">Nikhil Dhara Venkata&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/deepanshu-madan">Deepanshu Madan&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/deborah-khider">Deborah Khider&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Pypi&lt;/li>
&lt;/ul></description></item><item><title>Behavioral Context Recognition</title><link>https://ckids-datafirst.github.io/website/projects/2019-spring/512/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-spring/512/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project studied behavioral patterns by analyzing data from personal senses collected from 60 subjects. This data can be used to predict activities and infer people’s lifestyle and habits.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/ian-myoungsu-choi">Ian Myoungsu Choi&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Building an open catalog of integrated datasets for Los Angeles</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/413/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/413/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>While many open data efforts have managed to successfully expose public data in the web, it is often complicated to determine how these records can be integrated with each other (due to heterogeneous ids, not clear how to place them into a map, etc.). In this project, the student will leverage the novel techniques for integrating, registering and connecting datasets with overlapping elements. The results will be visualized by the student using interactive maps.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/anjana-niranjan">Anjana Niranjan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kushagra-singh-sachan">Kushagra Singh Sachan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/chi-sheng-yang">Chi Sheng Yang&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/daniel-garijo">Daniel Garijo&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;/ul></description></item><item><title>Building Sports Data Knowledge Graphs</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/414/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/414/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Public sports data is often spread across many differing sources, creating issues of entity resolution and record linkage. Knowledge graphs are a popular conceptual technology for storing, fusing and querying information from such disparate sources. This project will focus on building a sports data knowledge graph, from various open data and asset (e.g. video) sources/API, based on a Wikidata infrastructure.&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/jeremy-abramson">Jeremy Abramson&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Unix System&lt;/li>
&lt;li>SPARQL&lt;/li>
&lt;/ul></description></item><item><title>Capturing provenance of data analyses</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/416/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/416/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Documenting how a result was obtained from data analysis involves documenting the software, software settings, and datasets used to obtain that result so it can be explained properly. This project will design and develop a user interface for specifying provenance records using W3C standards. The interface will enable users to document the provenance of data analysis no matter what infrastructure they used (R scripts, sk-learn, etc).&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/rahul-jeswani">Rahul Jeswani&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/yolanda-gil">Yolanda Gil&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Javascript&lt;/li>
&lt;li>Firebase&lt;/li>
&lt;/ul></description></item><item><title>Creating and visualizing a linked knowledge base of crime data</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/412/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/412/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>A lot of data is available in the web in a tabular manner, but it’s difficult to manipulate and visualize without a significant effort. In this project, we aim to test a novel framework created at ISI to build and visualize knowledge bases. The objective is to create a knowledge base that extends the other resources in the Web such as Wikidata or Wikipedia, and visualize the results using interactive maps and plots.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/vedant-diwanji">Vedant Diwanji&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yi-li-chen">Yi-Li Chen&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/andrew-zhao">Andrew Zhao&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/haripriya-dharmala">Haripriya Dharmala&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/daniel-garijo">Daniel Garijo&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Knowledge Representation&lt;/li>
&lt;li>RDF&lt;/li>
&lt;/ul></description></item><item><title>Crosstown</title><link>https://ckids-datafirst.github.io/website/projects/2019-spring/507/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-spring/507/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Giorgos and John are developing a machine learning system to automatically detect, prioritize, and alert journalists in the presence of abnormalities in crime data. Through this project, they want to assist journalists to identify interesting stories in data. The data is rich in features, and by using general feature engineering.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/giorgos-constantinou">Giorgos Constantinou&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/john-cutone">John Cutone&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/gabriel-kahn">Gabriel Kahn&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/seon-ho-kim">Seon Ho Kim&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul></description></item><item><title>Data infrastructure for USC</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/417/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/417/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Developing software and data resources for USC students. Software resources include tools to process and analyze specific types of data (eg social networks, images, text, etc), data preparation tools, or machine learning libraries. Data resources include thematic data repositories, such as urban LA data, environmental LA data, entertainment LA data, etc.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yang-dai">Yang Dai&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/zixuan-zhang">Zixuan Zhang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/parul-gupta">Parul Gupta&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yu-wang">Yu Wang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sanjiv-soni">Sanjiv Soni&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shenoy-pratik-gurudatt">Shenoy Pratik Gurudatt&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/yolanda-gil">Yolanda Gil&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Open Source Software Development&lt;/li>
&lt;li>Data Services&lt;/li>
&lt;/ul></description></item><item><title>Data Mining Over Past Climates</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/404/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/404/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Estimates of climate variations over the past 1,000 years play an increasing role in climate assessments. A key quantity to derive from them is the transient climate response (TCR), which quantifies the warming at expected from slowly-rising CO2 concentrations. TCR helps constrain the climate models used to predict the future evolution of Earth’s climate. In this project, you will help design an efficient workflow to estimate TCR from existing paleoclimate datasets and emerging statistical methods.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yuxuan-ji">Yuxuan Ji&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/zhifeng-liu">Zhifeng Liu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ashka-patel">Ashka Patel&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/aditi-choudhary">Aditi Choudhary&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shelly-mehta">Shelly Mehta&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/julien-emile-geay">Julien Emile-Geay&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/deborah-khider">Deborah Khider&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;/ul></description></item><item><title>Detecting deep fakes</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/418/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/418/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Spread of misinformation has become a significant problem, raising the importance of relevant detection methods. While there are different manifestations of misinformation, in this work we focus on detecting face manipulations in videos. This project will focus on detecting face manipulations in videos. We exploit the temporal dynamics of videos with recurrent networks.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/shenoy-pratik-gurudatt">Shenoy Pratik Gurudatt&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/wael-abd-almageed">Wael Abd-Almageed&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Programming&lt;/li>
&lt;/ul></description></item><item><title>Enhancing Thermal Control in Buildings</title><link>https://ckids-datafirst.github.io/website/projects/2019-spring/511/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-spring/511/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Mengqi presented her study about thermal control using Excel, Minitab, and Python for data analysis. She collected data in a lab setting and tried two models, a group model and an individual model. She found that a group model, which consisted of 20 subjects, did not work well, while individual models gave better results.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Highlighted Project&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/mengqi-jia">Mengqi Jia&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/joon-ho-choi">Joon-Ho Choi&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://www.dropbox.com/s/f5kd6s8iv1b1m9q/Thesis%20Presentation.pptx?dl=0" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Foster Care Children: Administrative Data and Computational Methods</title><link>https://ckids-datafirst.github.io/website/projects/2019-spring/506/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-spring/506/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project is using population-based administrative data, including birth, medical, and education records to study child welfare services. The research topics include: a family-level analysis of first births and sibling re-reports in the foster care system; identifying mothers who gave subsequent birth after the termination of parental rights; modeling the child protective services system using Markov models; and predicting risks for aging youth.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/eunhye-ahn">Eunhye Ahn&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/emily-putnam-hornstein">Emily Putnam-Hornstein&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://docs.google.com/presentation/d/1rbJu4no7acBdfUCv3zP5wvjdG04vlhh70eFOCzHomkg/edit?usp=sharing" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Game Data and Social Capital</title><link>https://ckids-datafirst.github.io/website/projects/2019-spring/503/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-spring/503/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Natalie and Calvin look at social capital from two angles: social capital as a predictor and social capital as an outcome. First, they want to observe whether people who display social capital exhibit certain characteristics or behavior. Second, they want to study if people who are interested in a specific topic will exhibit certain types of social capital.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/natalie-jonckheere">Natalie Jonckheere&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/calvin-liu">Calvin Liu&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/dmitri-williams">Dmitri Williams&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Learning to Connect: Modeling Social Network Dynamics and Evolution by Imitation Learning</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/420/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/420/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>In this research, we aim to model how human players make connection decisions in an online game where players are free to add or delete a friend, as well as join a clan.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/yiley-zeng">Yiley Zeng&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/dmitri-williams">Dmitri Williams&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/emilio-ferrara">Emilio Ferrara&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul></description></item><item><title>Lighting Control in Buildings for Visual Comfort</title><link>https://ckids-datafirst.github.io/website/projects/2019-spring/510/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-spring/510/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Lingkai collected data in a controlled setting and studied lighting control in buildings for visual comfort. For data preprocessing, he used Excel and MathWorks; for data analysis, he used Python and Scikit learn.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Highlighted Project&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/lingkai-cen">Lingkai Cen&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/joon-ho-choi">Joon-Ho Choi&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://www.dropbox.com/s/krllixhcdt8nacc/4.26_Lingkai_Cen.pptx?dl=0" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Measuring Pollution Benefits from Congestion Pricing Initiatives</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/419/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/419/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Using real-time big data from Los Angeles freeways on traffic and Aclima data on pollution measurements, this project will estimate the links between speed and pollution. Estimating this relationship properly is crucial for knowing the benefits that congestion pricing may generate in terms of pollution reduction. Computer Science methods will be used to guide the choice of policy intervention and guide prediction.&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/antonio-bento">Antonio Bento&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Machine Learning&lt;/li>
&lt;li>R&lt;/li>
&lt;/ul></description></item><item><title>Measuring population-level nutrition and dietary habits from Instagram</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/402/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/402/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project will investigate the quality of Instagram textual posts as a source of data for measurements of dietary patterns and nutrition quality, focusing on spatial and textual features of posts linked to food outlets. Using an Instagram dataset of all geo-located posts at food outlets in Los Angeles for 3 months in 2014, this project will investigate whether Instagram posts, despite implicit biases (and to the extent possible, accounting for these biases), can provide a representative health signal, informative of the quality of population nutrition and dietary patterns at a highly-resolved (e.g. census tract level) spatial scale.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Data Science Collaboration Practices&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/divyatmika-lnu">Divyatmika Lnu&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shagun-gupta">Shagun Gupta&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nina-thiebaut">Nina Thiebaut&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nisha-tiwari">Nisha Tiwari&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ian-choi">Ian Choi&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/andres-abeliuk">Andrés Abeliuk&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/abigail-horn">Abigail Horn&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kayla-de-la-haye">Kayla de la Haye&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Social Network Analysis&lt;/li>
&lt;li>Statistical Modeling&lt;/li>
&lt;/ul></description></item><item><title>Mining Side Effects in Cancer Treatment</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/406/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/406/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>SideEffects is a cancer patient’s resource to access treatment and side effects tailored to the patient’s treatment and disease history. The app sources content from clinical data, National Cancer Institute, social media, and user input from an app.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Project Presentation&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/justin-ho">Justin Ho&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sankareswari-govindarajan">Sankareswari Govindarajan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/kevin-tran">Kevin Tran&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/yi-hsin-chung">Yi-Hsin Chung&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sangeeth-koratten">Sangeeth Koratten&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/fred-morstatter">Fred Morstatter&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/ken-nguyen">Ken Nguyen&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/thuy-thanh-truong">Thuy Thanh Truong&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul></description></item><item><title>Modeling the career trajectory of music artists</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/407/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/407/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Many musicians, from up-and-comers to established artists, rely heavily on performing live to promote and disseminate their music. To advertise live shows, artists often use concert discovery platforms that make it easier for their fans to track tour dates. In this project, we ask whether digital traces of musical performances generated on those platforms can be used to understand career trajectories of artists. We have amassed a dataset we constructed by cross-referencing data from such platforms (Songkick, and Discogs). In this project, you will identify and explore patterns that can be used to identify successful musicians.&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/fred-morstatter">Fred Morstatter&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;/ul></description></item><item><title>Modeling Uncertainty in Drought Data</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/410/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/410/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Droughts can have a substantial impact on agricultural systems and human livelihood. A Python package to calculate various drought indices in being developed. In this project, you will expand on this package and develop methods to test the sensitivity of the models to various input datasets and parameters.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Project Achievement&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/shravya-manety">Shravya Manety&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/abhilash-pandurangan">Abhilash Pandurangan&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/deborah-khider">Deborah Khider&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;/ul></description></item><item><title>Multiplayer Game’s Solo Players</title><link>https://ckids-datafirst.github.io/website/projects/2019-spring/502/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-spring/502/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Even when engaging in a multiplayer online game, some players play by themselves. Do Own is interested in investigating personality, motivation, and behavioral patterns of social network isolates.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/do-own-donna-kim">Do Own (Donna) Kim&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/dmitri-williams">Dmitri Williams&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="final-presentation-resources">Final presentation resources&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://docs.google.com/presentation/d/1dI4Q65xrjIUt_OsHStoXI6CzoEOe45g4cjmQWxBQZXA/edit?usp=sharing" target="_blank" rel="noopener">Final presentation&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Overview of Multiplayer Games Dataset</title><link>https://ckids-datafirst.github.io/website/projects/2019-spring/501/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-spring/501/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Online multiplayer games provide a wealth of data that can be used to study human behaviors. Professor Williams describes the kinds of questions that can be investigated with rich datasets of online game player actions, interactions, and targeted survey questions. Students in his group focused on a wide range of projects that use this data to study a range of human behaviors.&lt;/p>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/dmitri-williams">Dmitri Williams&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Tell us where it hurts</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/415/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/415/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>LA Care has a mission to “provide access to quality health care for Los Angeles County’s vulnerable and low-income communities and residents and to support the safety net required to achieve that purpose.” In the many coordinated activities LA Care conducts to provide a comprehensive health insurance safety net, it collects massive amounts of healthcare data. With advances in analytics enabled by AI approaches (e.g. predictive modeling, machine learning, model refinement and validation), the organization is looking for ways to mine and analyze its data to drive optimization and improvement of product development, marketing techniques and business strategies. Students will work with stakeholders throughout the organization to identify opportunities for leveraging company data to drive business solutions. The ability to identify and address “pain points” will depend on the skills that students bring to the project.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/yihang-chen">Yihang Chen&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/lisa-meng">Lisa Meng&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/chiaofeng-yang">Chiaofeng Yang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/nelson-lam">Nelson Lam&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/phil-mcabee">Phil McAbee&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/george-tolomiczenko">George Tolomiczenko&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>R&lt;/li>
&lt;li>Python&lt;/li>
&lt;li>Machine Learning&lt;/li>
&lt;li>Javascript&lt;/li>
&lt;li>Data Mining&lt;/li>
&lt;/ul></description></item><item><title>Tracking Coastal Change at Catalina Island</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/403/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/403/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Since 1992, the USC Wrigley Institute for Environmental Studies ‘Catalina Conservation Divers’ have been collecting underwater biological and environmental data from coastal ocean sites around Catalina Island, California. In cooperation with the USC Wrigley Institute, the CCD team (made up of community scientists and volunteer SCUBA divers) conducts quarterly surveys of marine species and benthic water temperatures at various depths and locations. The Wrigley Institute has been collecting and archiving this data for years, and the data has not been holistically studied to date. We need assistance in analyzing data for trends across location, ocean depth, and time.&lt;/p>
&lt;h2 id="awards">Awards&lt;/h2>
&lt;ul>
&lt;li>Best Data Science Poster&lt;/li>
&lt;/ul>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/wei-fan-chen">Wei-Fan Chen&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/sameeksha-mahajan">Sameeksha Mahajan&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/zijing-zhang">Zijing Zhang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/pratheek-athreya">Pratheek Athreya&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/diane-kim">Diane Kim&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jessica-dutton">Jessica Dutton&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/deborah-khider">Deborah Khider&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Statistics&lt;/li>
&lt;li>Programming&lt;/li>
&lt;/ul></description></item><item><title>Understanding human environmental perceptions using multi-biometric signals in the built environment</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/405/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/405/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Human, as a building occupant, is always surrounded by several indoor environmental quality (IEQ) elements, such as thermal, visual, air, and acoustic conditions. Therefore, the user’s environmental comfort and work productivity are significantly affected by the IEQ conditions, especially in residential, office, and educational facilities. This research is to investigate the relationships between the user’s IEQ comfort perceptions, IEQ conditions and his/her bio-metric signals to understand how to identify individual IEQ perception as a function of single or combined bio-signals (changes). The study outcome will have a potential to be integrated with the existing building mechanical/electrical control systems to enhance the user’s IEQ conditions while contributing to his/her comfort and well-being in the built environment.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/manoj-muralidhara">Manoj Muralidhara&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shubham-banka">Shubham Banka&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/gaurav-gupta">Gaurav Gupta&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/joon-ho-choi">Joon-Ho Choi&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/seon-ho-kim">Seon Ho Kim&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>Python&lt;/li>
&lt;li>Data Mining&lt;/li>
&lt;li>SPSS&lt;/li>
&lt;li>R&lt;/li>
&lt;/ul></description></item><item><title>Understanding Internet Communities through Videogames</title><link>https://ckids-datafirst.github.io/website/projects/2019-fall/401/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-fall/401/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Online multiplayer games provide a wealth of data that can be used to study human behaviors. Many questions that can be investigated with rich datasets of online game player actions, interactions, and targeted survey questions. We have a wide range of ongoing student projects that use this data to study a range of human behaviors.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/junchu-zhang">Junchu Zhang&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/himan-kriplani">Himan Kriplani&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/shatad-purohit">Shatad Purohit&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/jinney-guo">Jinney Guo&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/dmitri-williams">Dmitri Williams&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/fred-morstatter">Fred Morstatter&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="skills-required-by-the-team">Skills Required by the team&lt;/h2>
&lt;ul>
&lt;li>R&lt;/li>
&lt;/ul></description></item><item><title>Who is the Best Game Mentor?</title><link>https://ckids-datafirst.github.io/website/projects/2019-spring/505/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-spring/505/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>This project explores the influence of the personality of game players who become mentors on mentoring outcomes using machine learning. The project will use survey data to analyze mentors’ extraversion and agreeableness as well as mentees’ game performance and churn rates.&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/joo-wha-hong">Joo-Wha Hong&lt;/a>&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/dmitri-williams">Dmitri Williams&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Who is the One staying?</title><link>https://ckids-datafirst.github.io/website/projects/2019-spring/504/</link><pubDate>Tue, 01 Jan 2019 00:00:00 +0000</pubDate><guid>https://ckids-datafirst.github.io/website/projects/2019-spring/504/</guid><description>&lt;h2 id="description">Description&lt;/h2>
&lt;p>Motivation plays a strong role as a moderator in the relationship between gamers’ in-game performances and enjoyment and churn, respectively. The research question for this project is ‘what is the relationship between players’ competitiveness and their level of enjoyment/churn?'&lt;/p>
&lt;h2 id="students">Students&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="../../../author/qiyao-joyce-peng">Qiyao (Joyce) Peng&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="../../../author/alejandro-marin">Alejandro Marin&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="advisors">Advisors&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="../../../author/dmitri-williams">Dmitri Williams&lt;/a>&lt;/li>
&lt;/ul></description></item></channel></rss>