▵ Van Horn and Perona open with a brilliant one-liner: the world is long-tailed
Van Horn and Perona open with a brilliant one-liner: the world is long-tailed. The diagram above shows analysis from Deep Learning Analytics, the #2 team placing in the iNaturalist 2018 competition. Part of that challenge was how many of the classes to be learned had few data points for training. That condition is much more “real world” than the famed ImageNet – with an average of ~500 instances per class – which helped make “deep learning” a popular phrase. The aforementioned sea change from Lange, Jonas, et al., addresses the problem of reducing data demands. I can make an educated guess that your enterprise ML use cases resemble iNaturalist more than ImageNet, and we need to find ways to produce effective models which don’t require enormous labeled data sets. — https://blog.dominodatalab.com/themes-and-conferences-per-pacoid-episode-2/
▵ All these examples tell the same story: that the world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly
All these examples tell the same story: that the world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account.
But they are also creating a host of new problems. Despite the abundance of tools to capture, process and share all this information—sensors, computers, mobile phones and the like—it already exceeds the available storage space (see chart 1). Moreover, ensuring data security and protecting privacy is becoming harder as the information multiplies and is shared ever more widely around the world. — https://www.economist.com/special-report/2010/02/25/data-data-everywhere