Our ability to collect data far outpaces our ability to fully utilize it—yet those data may hold the key to solving some of the biggest global challenges facing us today.我们收集信息的能力相比之下优于分析用于的能力,然而,这些消息有可能包括了我们现如今正在面对的全球性挑战的解决办法。Take, for instance, the frequent outbreaks of waterborne illnesses as a consequence of war or natural disasters. The most recent example can be found in Yemen, where roughly 10,000 new suspected cases of cholera are reported each week—and history is riddled with similar stories. What if we could better understand the environmental factors that contributed to the disease, predict which communities are at higher risk, and put in place protective measures to stem the spread?比如,战后或大自然灾难引发的水源性传播疾病频密愈演愈烈。
最近的例子再次发生在利比亚,每个星期利比亚新发现大约一万例疑为鼠疫病例。而且历史总是相近的。如果我们能更佳地解读环境因素对该病的影响,提早预测高风险社区,以保护性方法来制止源头传播,将不会怎么样呢?Answers to these questions and others like them could potentially help us avert catastrophe.这些问题和其他相近问题的答案可能会潜在地协助我们制止灾难。
We already collect data related to virtually everything, from birth and death rates to crop yields and traffic flows. IBM estimates that each day, 2.5 quintillion bytes of data are generated. To put that in perspective: thats the equivalent of all the data in the Library of Congress being produced more than 166,000 times per 24-hour period. Yet we dont really harness the power of all this information. Its time that changed—and thanks to recent advances in data analytics and computational services, we finally have the tools to do it.我们完全为每样东西搜集数据,从出生率死亡率到粮食变量和交通状况。IBM公司估算每天有2.5个五万亿字节的数据产生。从这个角度来看:这等同于美国国会图书馆每24小时产生的数据的16.6万倍。
但我们并无法掌控所有的信息。但由于近来先进设备的数据分析和计算机服务,我们再一有了转变它的工具。As a data scientist for Los Alamos National Laboratory, I study data from wide-ranging, public sources to identify patterns in hopes of being able to predict trends that could be a threat to global security. Multiple data streams are critical because the ground-truth data (such as surveys) that we collect is often delayed, biased, sparse, incorrect or, sometimes, nonexistent.作为洛斯阿拉莫斯国家实验室的数据科学家,我研究来自普遍公共来源的数据,以确认模式,期望需要预测有可能对全球安全性构成威胁的趋势。
多个数据流是至关重要的,因为我们搜集的基本事实数据(比如调查)经常是延后的、有偏见的、稠密的、不准确的,有时甚至是不不存在的。For example, knowing mosquito incidence in communities would help us predict the risk of mosquito-transmitted disease such as dengue, the leading cause of illness and death in the tropics. However, mosquito data at a global (and even national) scale are not available.荐个例子,理解蚊子在一个社区的传染发生率将不会协助我们预测蚊子的传染登革热病的风险,登革热是造成热带地区疾病和丧生的首要原因。然而,目前还没全球(甚至全国)规模的蚊虫数据。To address this gap, were using other sources such as satellite imagery, climate data and demographic information to estimate dengue risk. Specifically, we had success predicting the spread of dengue in Brazil at the regional, state and municipality level using these data streams as well as clinical surveillance data and Google search queries that used terms related to the disease. While our predictions arent perfect, they show promise. Our goal is to combine information from each data stream to further refine our models and improve their predictive power.为了填补这一差距,我们正在利用卫星图像、气候数据和人口信息等其他来源来估算登革热风险。
具体来说,我们顺利地利用这些数据流、临床监测数据和用于与疾病有关的术语的谷歌搜寻查找,预测了登革热在巴西的地区、州和市一级的蔓延到。虽然我们的预测并不极致,但它们表明出有了期望。我们的目标是将来自每个数据流的信息融合一起,以更进一步完备我们的模型并提升它们的预测能力。
Similarly, to forecast the flu season, we have found that Wikipedia and Google searches can complement clinical data. Because the rate of people searching the internet for flu symptoms often increases during their onset, we can predict a spike in cases where clinical data lags.某种程度,为了预测流感季节,我们找到维基百科和谷歌搜寻可以补足临床数据。由于人们在互联网上搜寻流感症状的比率在发作期间常常减少,我们可以预测到临床数据迟缓的病例不会经常出现剧增。Were using these same concepts to expand our research beyond disease prediction to better understand public sentiment. In partnership with the University of California, were conducting a three-year study using disparate data streams to understand whether opinions expressed on social media map to opinions expressed in surveys.我们用某种程度的概念来拓展我们的研究以更佳地解读大众的点子。
我们正在展开一项与加州大学合作的为期三年的研究,该研究运用有所不同的数据流来理解社交媒体上所传达的观点否与调查中所阐释的完全一致。For example, in Colombia, we are conducting a study to see whether social media posts about the peace process between the government and FARC, the socialist guerilla movement, can be ground-truthed with survey data. A University of California, Berkeley researcher is conducting on-the-ground surveys throughout Colombia—including in isolated rural areas—to poll citizens about the peace process. Meanwhile, at Los Alamos, were analyzing social media data and news sources from the same areas to determine if they align with the survey data.例如,在哥伦比亚,我们正在展开一项研究,想到关于政府和社会主义游击队运动之间和平进程的社交媒体帖子否可以用调查数据来证实。加州大学伯克利分校的一名研究员正在哥伦比亚各地(还包括偏僻的农村地区)展开实地调查,调查公民对和平进程的观点。与此同时,在洛斯阿拉莫斯,我们正在分析来自同一地区的社交媒体数据和新闻来源,以确认它们否与调查数据完全一致。
If we can demonstrate that social media accurately captures a populations sentiment, it could be a more affordable, accessible and timely alternative to what are otherwise expensive and logistically challenging surveys. In the case of disease forecasting, if social media posts did indeed serve as a predictive tool for outbreaks, those data could be used in educational campaigns to inform citizens of the risk of an outbreak (due to vaccine exemptions, for example) and ultimately reduce that risk by promoting protective behaviors (such as washing hands, wearing masks, remaining indoors, etc. ).如果我们能证明社交媒体能精确捕猎公众情绪,相比于便宜、交通十分不便的调查而言,它就可以沦为一种更加实惠、可提供和及时的替代方法。如预测疾病时,如果社交媒体数据显然是有效地预测疾病愈演愈烈的工具,这些数据就可以用来教育公众,告诉他他们有疾病愈演愈烈的风险(例如疫苗免税),并最后通过增进保护性措施来增大危害(如吸取、戴着口罩、待在室内等)。
All of this illustrates the potential for big data to solve big problems. Los Alamos and other national laboratories that are home to some of the worlds largest supercomputers have the computational power augmented by machine learning and data analysis to take this information and shape it into a story that tells us not only about one state or even nation, but the world as a whole. The information is there; now its time to use it.所有这些都指出用大数据解决问题大问题的潜力。洛斯阿拉莫斯和其他国家实验室享有世界仅次于的超级电脑,且因为机器学习和数据分析,其运算能力更为强劲,因此可以运用信息,传送消息,某种程度惠及一个州,一个国家,而且是整个世界。
信息就在那里,是时候用于它了。
本文关键词:tb天博体育,天博全站app登录入口,天博Tb综合体育网页版,天博·体育全站app官网,天博·体育登录入口网页版
本文来源:tb天博体育-www.duntu.net