Stanford Data Lab students hit the ground running…

Read an article (Students confront the messiness of data) today about Stanford’s Data Lab  and how their students are trained to cleanup and analyze real world data.

The Data Lab teaches two courses the Data Challenge Lab course and the Data Impact Lab course. The Challenge Lab is an introductory course in data gathering, cleanup and analysis. The Impact Lab is where advanced students tackle real world, high impact problems through data analysis.

Data Challenge Lab

Their Data Challenge Lab course is a 10 week course with no pre-requisites that teaches students how to analyze real world data to solve problems.

Their are no lectures. You’re given project datasets and the tools to manipulate, visualize and analyze the data. Your goal is to master the tools, cleanup the data and gather insights from the data. Professors are there to provide one on one help so you can step through the data provided and understand how to use the tools.

In the information provided on their website there were no references and no information about the specific tools used in the Data Challenge Lab to manipulate, visualize and analyze the data. From an outsiders’ viewpoint it would be great to have a list of references or even websites describing the tools being used and maybe the datasets that are accessed.

Data Impact Lab

The Data Impact lab course is an independent study course, whose only pre-req is the Data Challenge Lab.

Here students are joined into interdisplinary teams with practitioner partners to tackle ongoing, real world problems with their new data analysis capabilities.

There is no set time frame for the course and it is a non-credit activity. But here students help to solve real world problems.

Current projects in the Impact lab include:

  • The California Poverty Project  to create an interactive map of poverty in California to supply geographic guidance to aid agencies helping the poor
  • The Zambia Malaria Project to create an interactive map of malarial infestation to help NGOs and other agencies target remediation activity.

Previous Impact Lab projects include: the Poverty Alleviation Project to provide a multi-dimensional index of poverty status for areas in Kenya so that NGOs can use these maps to target randomized experiments in poverty eradication and the Data Journalism Project to bring data analysis tools to breaking stories and other journalistic endeavors.

~~~~

Courses like these should be much more widely available. It’s almost the analog to the scientific method, only for the 21st century.

Science has gotten to a point, these days, where data analysis is a core discipline that everyone should know how to do. Maybe it doesn’t have to involve Hadoop but rudimentary data analysis, manipulation, and visualization needs to be in everyone’s tool box.

Data 101 anyone?

Photo Credit(s): Big_Data_Prob | KamiPhuc;

Southbound traffic speeds on Masonic avenue on different dates | Eric Fisher;

Unlucky Haiti (1981-2010) | Jer Thorp;

Bristol Cycling Level by Wards 2011 | Sam Saunders

Crowdresearch, crowdsourced academic research

Read an article in Stanford Research, Crowdsourced research gives experience to global participants that discussed an activity in Stanford and other top tier research institutions to try to get global participation in academic research. The process is discussed more fully in a scientific paper (PDF here) by researchers from Stanford, MIT Media Lab, Cornell Tech and UC Santa Cruz.

They chose three projects:

  • A HCI (human computer interaction) project to design, engineer and build a new paid crowd sourcing marketplace (like Amazon’s Mechanical Turk).
  • A visual image recognition project to improve on current visual classification techniques/algorithms.
  • A data science project to design and build the world’s largest wisdom of the crowds experiment.

Why crowdsource academic research?

The intent of crowdsourced research is to provide top tier academic research experience to persons which have no access to top research organizations.

Participating universities obtain more technically diverse researchers, larger research teams, larger research projects, and a geographically dispersed research community.

Collaborators win valuable academic research experience, research community contacts, and potential authorship of research papers as well as potential recommendation letters (for future work or academic placement),

How does crowdresearch work?

It’s almost an open source and agile development applied to academic research. The work week starts with the principal investigator (PI) and research assistants (RAs) going over last week’s milestone deliveries to see which to pursue further next week. The crowdresearch uses a REDDIT like posting and up/down voting to determine which milestone deliverables are most important. The PI and RAs review this prioritized list to select a few to continue to investigate over the next week.

The PI holds an hour long video conference (using Google Hangouts On Air Youtube live stream service). On the conference call all collaborators can view the stream but only a select few are on camera. The PI and the researchers responsible for the important milestone research of the past week discuss their findings and the rest of the collaborators on the team can participate over Slack. The video conference is archived and available  to be watched offline.

At the end of the meeting, the PI identifies next weeks milestones and potentially directly responsible investigators (DRIs) to work on them.

The DRIs and other collaborators choose how to apportion the work for the next week and work commences. Collaboration can be fostered and monitored via Slack and if necessary, more Google live stream meetings.

If collaborators need help understanding some technology, technique, or too, the PI, RAs or DRIs can provide a mini video course on the topic or can point to other information used to get the researchers up to speed. Collaborators can ask questions and receive answers through Slack.

When it’s time to write the paper, they used Google Docs with change tracking to manage the writing process.

The team also maintained a Wiki on the overall project to help new and current members get up to speed on what’s going on. The Wiki would also list the week’s milestones, video archives, project history/information, milestone deliverables, etc.

At the end of the week, researchers and DRIs would supply a mini post to describe their work and link to their milestone deliverables so that everyone could review their results.

Who gets credit for crowdresearch?

Each week, everyone on the project is allocated 100 credits and apportions these credits to other participants the weeks activities. The credits are  used to drive a page-rank credit assignment algorithm to determine an aggregate credit score for each researcher on the project.

Check out the paper linked above for more information on the credit algorithm. They tried to defeat (credit) link rings and other obvious approaches to stealing credit.

At the end of the project, the PI, DRIs and RAs determine a credit clip level for paper authorship. Paper authors are listed in credit order and the remaining, non-author collaborators are listed in an acknowledgements section of the paper.

The PIs can also use the credit level to determine how much of a recommendation letter to provide for researchers

Tools for crowdresearch

The tools needed to collaborate on crowdresearch are cheap and readily available to anyone.

  • Google Docs, Hangouts, Gmail are all freely available, although you may need to purchase more Drive space to host the work on the project.
  • Wiki software is freely available as well from multiple sources including Wikipedia (MediaWiki).
  • Slack is readily available for a low cost, but other open source alternatives exist, if that’s a problem.
  • Github code repository is also readily available for a reasonable cost but  there may be alternatives that use Google Drive storage for the repo.
  • Web hosting is needed to host the online Wiki, media and other assets.

Initial projects were chosen in computer science, so outside of the above tools, they could depend on open source. Other projects will need to consider how much experimental apparatus, how to fund these apparatus purchases, and how a global researchers can best make use of these.

My crowdresearch projects

Some potential commercial crowdresearch projects where we could use aggregate credit score and perhaps other measures of participation to apportion revenue, if any.

  • NVMe storage system using a light weight storage server supporting NVMe over fabric access to hybrid NVMe SSD – capacity disk storage.
  • Proof of Stake (PoS) Ethereum pooling software using Linux servers to create a pool for PoS ETH mining.
  • Bipedal, dual armed, dual handed, five-fingered assisted care robot to supply assistance and care to elders and disabled people throughout the world.

Non-commercial projects, where we would use aggregate credit score to apportion attribution and any potential remuneration.

  • A fully (100%?) mechanical rover able to survive, rove around, perform  scientific analysis, receive/transmit data and possibly, effect repairs from within extreme environments such as the surface of Venus, Jupiter and Chernoble/Fukishima Daiichi reactor cores.
  • Zero propellent interplanetary tug able to rapidly transport rovers, satellites, probes, etc. to any place within the solar system and deploy theme properly.
  • A Venusian manned base habitat including the design, build process and ongoing support for the initial habitat and any expansion over time, such that the habitat can last 25 years.

Any collaborators across the world, interested in collaborating on any of these projects, do let me know, here via comments. Please supply some way to contact you and any skills you’re interested in developing or already have that can help the project(s).

I would be glad to take on PI role for the most popular project(s), if I get sufficient response (no idea what this would be). And  I’d be happy to purchase the Drive, GitHub, Slack and web hosting accounts needed to startup and continue to fruition the most popular project(s). And if there’s any, more domain experienced PIs interested in taking any of these projects do let me know.  

Comments?

Picture Credit(s): Crowd by Espen Sundve;

Videoblogger Video Conference by Markus Sandy;

Researchers Night 2014 by Department of Computer Science, NTNU;

Materials science rescues civilization, again

Read a bunch of articles this past week from MIT Technology Review, How materials science will determine the future of human civilization, from Stanford University, New ultra thin semiconductor materials…, and Wired, This battery breakthrough could change everything.

The message varied a bit between articles but there was an underlying theme to all of them. Materials science was taking off, unlike it ever has before. Let’s take them on, one by one, last in first out.

New battery materials

I have not reported on new battery structures or materials in the past but it seems that every week or so I run across another article or two on the latest battery technology that will change everything. Yet this one just might do that.

I am no material scientist but Bill Joy has been investing in a company, Ionic Materials, for a while now (both in his job as a VC partner and as in independent invested) that has been working on a solid battery material that could be used to create rechargeable batteries.

The problems with Li(thium)-Ion batteries today are that they are a safety risk (lithium is a highly flammable liquid) and they use an awful lot of a relatively scarce mineral (lithium is mined in Chile, Argentina, Australia, China and other countries with little mined in USA). Electric cars would not be possible today with Li-On batteries.

Ionic Materials claim to have designed a solid polymer electrolyte that can combine the properties of familiar, ultra-safe alkaline batteries we use everyday and the recharge ability of  Li-Ion batteries used in phones and cars today. This would make a cheap, safe rechargeable battery that could work anywhere. The polymer just happens to also be fire retardant.

The historic problems with alkaline, essentially zinc and manganese dioxide is that they can’t be recharged too many times before they short out. But with the new polymer these batteries could essentially be recharged for as many times as Li-Ion today.

Currently, the new material doesn’t have as many recharge cycles as they want but they are working on it. Joy calls the material ional.

New semiconductor materials

Moore’s law will eventually cease. It’s only a question of time and materials.

Silicon is increasingly looking old in the tooth. As researchers shrink silicon devices down to atomic scales, they start to breakdown and stop functioning.

The advantages of silicon are that it is extremely scaleable (shrinkable) and easy to rust. Silicon rust or silicon dioxide was very important because it is used as an insulator. As an insulating layer, it could be patterned just like the silicon circuits themselves. That way everything (circuits, gates, switches and insulators) could all use the same, elemental material.

A couple of Stanford researchers, Eric Pop and Michal Mleczko, a electrical engineering professor and a post doc researcher, have discovered two new materials that may just take Moore’s law into a couple of more chip generations. They wrote about these new materials in their paper in Science Advances.

The new materials: hafnium diselenide and zirconium diselenide have many similar properties to silicon. One is that they can be easily made to scale. But devices made with the new materials still function at smaller geometries, at just three atoms thick (0.67nm) and also consume happen less power.

That’s good but they also rust better. When the new materials rust, they form a high-K insulating material. With silicon, high-K insulators required additional materials/processing and more than just simple silicon rust anymore. And the new materials also match Silicon’s band gap.

Apparently the next step with these new materials is to create electrical contacts. And I am sure as any new material, introduced to chip fabrication will take quite awhile to solver all the technical hurdles. But it’s comforting to know that Moore’s law will be around another decade or two to keep us humming away.

New multiferric materials

But just maybe the endgame in chip fabrication materials and possibly many other domains seems to be new materials coming out of ETH Zurich Switzerland.

There a researcher, Nicola Saldi,n has described a new sort of material that has both ferro-electric and ferro-magnetic properties.

Spaldin starts her paper off by discussing how civilization evolved mainly due to materials science.

Way in the past, fibers and rosin allowed humans to attach stone blades and other material to poles/arrows/axhandles to hunt  and farm better. Later, the discovery of smelting and basic metallurgy led to the casting of bronze in the bronze age and later iron, that could also be hammered, led to the iron age.  The discovery of the electron led to the vacuum tube. Pure silicon came out during World War II and led to silicon transistors and the chip fabrication technology we have today

Spaldin talks about the other major problem with silicon, it consumes lots of energy. At current trends, almost half of all worldwide energy production will be used to power silicon electronics in a couple of decades.

Spaldin’s solution to the  energy consumption problem is multiferric materials. These materials offer both ferro-electric and ferro-magnetic properties in the same materials.

Historically, materials were either ferro-electric or ferro-magnetic but never both. However, Spaldin discovered there was nothing in nature prohibiting the two from co-existing in the same material. Then she and her compatriots designed new multiferric materials that could do just that.

As I understand it, ferro-electric material allow electrons to form chemical structures which create electrical dipoles or electronic fields. Similarly, ferro-magnetic materials allow chemical structures to create magnetic dipoles or magnetic fields.

That is multiferric materials can be used to create both magnetic and electronic fields. And the surprising part was that the boundaries between multiferric magnetic fields (domains) form nano-scale, conducting channels which can be moved around using electrical fields.

Seems to me that if this were all possible and one could fabricate a substrate using multi-ferrics and write (program) any electronic circuit  you want just by creating a precise magnetic and electrical field ontop of it. And with todays disk and tape devices, precise magnetic fields are readily available for circular and linear materials. And it would seem just as easy to use multi multiferric material for persistent data storage.

Spaldin goes on to say that replacing magnetic fields in todays magnetism centric information/storage industry with electrical fields should lead to  reduced energy consumption.

Welcome to the Multiferric age.

Photo Credit(s): Battery Recycling by Heather Kennedy;

AMD Quad Core backside by Don Scansen;  and

Magnetic Field – 14 by Windell Oskay

Quasar, data center scheduling reboot

Two people talking to one another in a data center hallway about one person wide with bunches of racks and cabling on either side
Microsoft Bing Maps’ datacenter by Robert Scoble

Read an article today from ZDnet called Data center scheduling as easy as watching a movie. It was about research out of Stanford University that shows how using short glimpses of applications in operation can be used to optimally determine the best existing infrastructure to run it on (for more info, see “Quasar: Resource-Efficient and QoS-Aware Cluster Management”  by Christina Delimitrou and Christos Kozyrakis).

What with all the world’s compute moving to the cloud, the cloud providers are starting to see poor CPU utilization. E.g., AWS’s EC2 average server utilization is typically between 3 and 17%, Google’s is between 25-25% and Twitter’s is consistently below 20%, source: paper above. Such poor utilization at cloud scale causing them to lose a lot of money.

Most cloud organizations and larger companies these days have myriad of servers they have acquired over time. These servers often range from the latest multi-core behemoths, to older servers that have seen better days.

Nonetheless, as new applications come into the mix, it’s hard to know whether they need the latest servers or could get by just as well with some older equipment that happens to be lying around idle in the shop. Because of this inability to ascertain the best infrastructure to run them on, it often leads to over provisioning/under utilization that we see today.

A better way to manage clusters

This is the classic problem that is trying to be solved by cluster management. There are essentially two issues in cluster management for new applications:

  • What resources the application will need to run,
  • Which available servers can best satisfy the application’s resource requirements,

The  first issue is normally answered by the application developer/deployer which they get to specify. When they get this wrong the applications run on severs with more resources than needed which end up being lightly utilized.

But if there was a way to automate the first step in this process?

It turns out if you run a new application for a short time you can determine its execution characteristics. Then if you coluld search a database of applications currently running on your infrastructure you could match how the new application runs with how current applications run and determine a pseudo-optimum fit for the best place to run the new application.

Such a system would need to monitor current applications and determine its server resource usage, e.g., memory use, IO activity, CPU utilization, etc. in your shop. The system would need to construct and maintain a database of applications to server resource utilization. Also, somewhere you would need a database of current server resources in your cluster.

But if you have all that in place, it seems like you could have a solution to the classic cluster management problem presented above.

What about performance critical apps

There’s a class of applications that have stringent QoS requirements that go beyond optimal runtime execution characteristics (latency/throughput sensitive workloads). These applications must be run in environments that can guarantee their latency requirements can be met. This may not be the most optimal location from a cluster perspective but it may be the only place it can run and meet its service objectives.

So any cluster management optimization would also need to factor in such application QoS requirements into its decision matrix on where to run new applications.

Quasar cluster management

The researchers at Stanford have implemented the Quasar cluster management solution which does all that. Today it provides

  1. A way for users to specify application QoS requirements for those applications that require special services,
  2. It takes and runs new applications quickly to ascertain it’s resource requirements and quickly classify its characteristics against a database of currently running applications, and
  3. It allocates new applications to the optimal server configurations that are available.

 

The paper cited above shows results from using Quasar cluster management on hadoop clusters, memcached and Cassandra clusters,  HotCRP clusters as well as a cloud environment. For the cloud environment Quasar has shown that it can boost server utilization for a 200 node cloud environment running 1200 workloads up to 65%.

The paper goes into more detail and there’s more information on Quasar available on Christina Delimitrou’s website.

~~~

Comments?

Cold hands, better exercise

Read an article a couple of weeks ago on Stanford researchers that are inventing a glove that can warm or cool someone’s hands.  (See Stanford’s cooling glove research). They found that the palm is one of the easiest way to warm or cool a body.

Originally they were researching how bears cooled themselves during summer and discovered that certain areas of their skin was optimized for cooling.  These patches of skin had more blood vessels than necessary for nutrient delivery and seemed to be optimized for blood flow and bodily thermal management. In the case of the bears they were studying these thermal control areas happened to be their palms and feet pads.

They next took their idea and created a crude prototype of a warming glove and used it to warm up patients after surgery.  This usually takes the better part of 2-3 hours to do for patients but with their warming glove on, they were able to warm them in a 8-10 minutes instead of hours.

It appears that all mammals have a built in cooling mechanism, for some it’s ears (rabbits), others it’s the tongue (dogs) and for humans and primates it’s their palms and feet pads.  These areas are used primarily for bodily thermal management and is used essentially as a way to cool off hairy mammals such as ourselves.

But why a cooling glove?

Unclear to me where they got the idea but somehow they discovered that one item which limited exercise intensity was the overheating of muscle tissue.  As muscles are exercised they warm up and an enzyme used by muscles to generate energy  heats up it breaks down and starts to work less effectively providing a built in safety switch for muscle overheating.  It turns out that heat is a key item limiting muscle recovery and inducing muscle fatigue.

The researchers found was that cooling the palms after exercise allowed a person to continue to work at their maximum level without fatigue or degradation. In the case of pull ups, they found that a person who was properly cooled could continue to do their maximum pullups time after time, without any reduction in reps.  Indeed one gym rat was able to work themselves from a maximum of 160 pullups to a maximum of 620 pull-ups  in just six weeks. “Better than steroids” and legal.

So the glove is being developed that can be used to cool athletes down. Prototypes are currently in use by Stanford’s football team, the Oakland Raiders, the San Francisco 49ers, and Manchester United.

Other potential applications

This is full of much more possibilities than just a cooling glove.  For example,

  • Bicycles – that instead of having insulated handlebars were metal and perforated so that they increased air flow to cool down the palms of bicycle racers
  • Weight machines  – that have hand holds that are suffused with liquids and some sort of a radiator device attachment that would cool the liquid to cool the hands
  • Barbell refrigerators – similar to the above only keeping the bars in a refrigerator to keep them cool to lower the temperature of the palms of weight lifters as they used them to lift weights.  Ditto for dumbbells, kettlebells, medicine balls, etc.
  • Treadmills – that have an onboard cooling mechanism to cool the hands of people using them. Ditto for rowing machines, nordic track, elipticals etc.
  • Tennis rackets – that have perforated handles without any insulation which could be used to cool down tennis players hands. Ditto for racquetball rackets, squash rackets, etc.
  • Baseball bats, golf clubs, hockey sticks, lacrosse sticks, etc.  – essentially any other sporting equipment that has a hand held artifact could be improved by having some sort of built in cooling mechanism, or worse case have a cabinet (bag) etc, which could cool these to the proper temperature.

I suppose one key to any of this is what’s the proper cooling temperature and whether any of these sports cause some (any) muscles to overheat.  Perforations could be tailored to reach the proper cooling temperature and for the sport/speed with which the artifact is moved.  Probably another reason for running barefoot or using Vibram five fingers shoes as they seem to cool down the foot pad.

I find myself looking for places to cool my palms between workout sets to reduce fatigue. It may be only psychosomatic because it’s certainly scientifically controlled, but it seems to be helping.  Also when I run/jog nowadays I am doing so with an open hand rather than a closed fist, hoping that this helps cool me down.

~~~~

I read this a while back and couldn’t stop thinking about all the possibilities inherent in their research. Yes a glove is probably a great, portable and universal way to cool people during exercise but there are so many other possibilities that could much more easily be employed to have the same effect.

Image Credits: bangkok by Roberto Trmbunch o racquets by dennis

12 atoms per bit vs 35 bits per electron

Shows 6 atom pairs in a row, with coloration of blue for interstitial space and yellow for external facets of the atom
from Technology Review Article

Read a story today in Technology Review on Magnetic Memory Miniaturized to Just 12 Atoms by a team at  IBM Research that created a (spin) magnetic “storage device” that used 12 iron atoms  to record a single bit (near absolute zero and just for a few hours).  The article said it was about 100X  denser than the previous magnetic storage record.

Holographic storage beats that

Wikipedia’s (soon to go dark for 24hrs) article on Memory Storage Density mentioned research at Stanford that in 2009 created an electronic quantum holographic device that stored 35 bits/electron using a sheet of copper atoms to record the letters S and U.

The Wikipedia article went on to equate 35bits/electron to ~3 Exabytes[10**18 bytes]/In**2.  (Although, how Wikipedia was able to convert from bits/electron to EB/in**2 I don’t know but I’ll accept it as a given)

Now an iron atom has 26 electrons and copper has 29 electrons.  If 35 bits/electron is 3 EB/in**2 (or ~30Eb/in**2), then 1 bit per 12 iron atoms (or 12*26=312 electrons) should be 0.0032bits/electron or ~275TB/in**2 (or ~2.8Pb/in**2).   Not quite to the scale of the holographic device but interesting nonetheless.

What can that do for my desktop?

Given that today’s recording head/media has demonstrated ~3.3Tb/in**2 (see our Disk drive density multiplying by 6X post), the 12 atoms per bit  is a significant advance for (spin) magnetic storage.

With today’s disk industry shipping 1TB/disk platters using ~0.6Tb/in**2 (see our Disk capacity growing out of sight post), these technologies, if implemented in a disk form factor, could store from 4.6PB to 50EB in a 3.5″ form factor storage device.

So there is a limit to (spin) magnetic storage and it’s about 11000X larger than holographic storage.   Once again holographic storage proves it can significantly store more data than magnetic storage if only it could be commercialized. (Probably a subject to cover in a future post.)

~~~~

I don’t know about you but 4.6PB drive is probably more than enough storage for my lifetime and then some.  But then again those new 4K High Definition videos, may take up a lot more space than my (low definition) DVD collection.

Comments?