Crowdresearch, crowdsourced academic research

Read an article in Stanford Research, Crowdsourced research gives experience to global participants that discussed an activity in Stanford and other top tier research institutions to try to get global participation in academic research. The process is discussed more fully in a scientific paper (PDF here) by researchers from Stanford, MIT Media Lab, Cornell Tech and UC Santa Cruz.

They chose three projects:

  • A HCI (human computer interaction) project to design, engineer and build a new paid crowd sourcing marketplace (like Amazon’s Mechanical Turk).
  • A visual image recognition project to improve on current visual classification techniques/algorithms.
  • A data science project to design and build the world’s largest wisdom of the crowds experiment.

Why crowdsource academic research?

The intent of crowdsourced research is to provide top tier academic research experience to persons which have no access to top research organizations.

Participating universities obtain more technically diverse researchers, larger research teams, larger research projects, and a geographically dispersed research community.

Collaborators win valuable academic research experience, research community contacts, and potential authorship of research papers as well as potential recommendation letters (for future work or academic placement),

How does crowdresearch work?

It’s almost an open source and agile development applied to academic research. The work week starts with the principal investigator (PI) and research assistants (RAs) going over last week’s milestone deliveries to see which to pursue further next week. The crowdresearch uses a REDDIT like posting and up/down voting to determine which milestone deliverables are most important. The PI and RAs review this prioritized list to select a few to continue to investigate over the next week.

The PI holds an hour long video conference (using Google Hangouts On Air Youtube live stream service). On the conference call all collaborators can view the stream but only a select few are on camera. The PI and the researchers responsible for the important milestone research of the past week discuss their findings and the rest of the collaborators on the team can participate over Slack. The video conference is archived and available  to be watched offline.

At the end of the meeting, the PI identifies next weeks milestones and potentially directly responsible investigators (DRIs) to work on them.

The DRIs and other collaborators choose how to apportion the work for the next week and work commences. Collaboration can be fostered and monitored via Slack and if necessary, more Google live stream meetings.

If collaborators need help understanding some technology, technique, or too, the PI, RAs or DRIs can provide a mini video course on the topic or can point to other information used to get the researchers up to speed. Collaborators can ask questions and receive answers through Slack.

When it’s time to write the paper, they used Google Docs with change tracking to manage the writing process.

The team also maintained a Wiki on the overall project to help new and current members get up to speed on what’s going on. The Wiki would also list the week’s milestones, video archives, project history/information, milestone deliverables, etc.

At the end of the week, researchers and DRIs would supply a mini post to describe their work and link to their milestone deliverables so that everyone could review their results.

Who gets credit for crowdresearch?

Each week, everyone on the project is allocated 100 credits and apportions these credits to other participants the weeks activities. The credits are  used to drive a page-rank credit assignment algorithm to determine an aggregate credit score for each researcher on the project.

Check out the paper linked above for more information on the credit algorithm. They tried to defeat (credit) link rings and other obvious approaches to stealing credit.

At the end of the project, the PI, DRIs and RAs determine a credit clip level for paper authorship. Paper authors are listed in credit order and the remaining, non-author collaborators are listed in an acknowledgements section of the paper.

The PIs can also use the credit level to determine how much of a recommendation letter to provide for researchers

Tools for crowdresearch

The tools needed to collaborate on crowdresearch are cheap and readily available to anyone.

  • Google Docs, Hangouts, Gmail are all freely available, although you may need to purchase more Drive space to host the work on the project.
  • Wiki software is freely available as well from multiple sources including Wikipedia (MediaWiki).
  • Slack is readily available for a low cost, but other open source alternatives exist, if that’s a problem.
  • Github code repository is also readily available for a reasonable cost but  there may be alternatives that use Google Drive storage for the repo.
  • Web hosting is needed to host the online Wiki, media and other assets.

Initial projects were chosen in computer science, so outside of the above tools, they could depend on open source. Other projects will need to consider how much experimental apparatus, how to fund these apparatus purchases, and how a global researchers can best make use of these.

My crowdresearch projects

Some potential commercial crowdresearch projects where we could use aggregate credit score and perhaps other measures of participation to apportion revenue, if any.

  • NVMe storage system using a light weight storage server supporting NVMe over fabric access to hybrid NVMe SSD – capacity disk storage.
  • Proof of Stake (PoS) Ethereum pooling software using Linux servers to create a pool for PoS ETH mining.
  • Bipedal, dual armed, dual handed, five-fingered assisted care robot to supply assistance and care to elders and disabled people throughout the world.

Non-commercial projects, where we would use aggregate credit score to apportion attribution and any potential remuneration.

  • A fully (100%?) mechanical rover able to survive, rove around, perform  scientific analysis, receive/transmit data and possibly, effect repairs from within extreme environments such as the surface of Venus, Jupiter and Chernoble/Fukishima Daiichi reactor cores.
  • Zero propellent interplanetary tug able to rapidly transport rovers, satellites, probes, etc. to any place within the solar system and deploy theme properly.
  • A Venusian manned base habitat including the design, build process and ongoing support for the initial habitat and any expansion over time, such that the habitat can last 25 years.

Any collaborators across the world, interested in collaborating on any of these projects, do let me know, here via comments. Please supply some way to contact you and any skills you’re interested in developing or already have that can help the project(s).

I would be glad to take on PI role for the most popular project(s), if I get sufficient response (no idea what this would be). And  I’d be happy to purchase the Drive, GitHub, Slack and web hosting accounts needed to startup and continue to fruition the most popular project(s). And if there’s any, more domain experienced PIs interested in taking any of these projects do let me know.  

Comments?

Picture Credit(s): Crowd by Espen Sundve;

Videoblogger Video Conference by Markus Sandy;

Researchers Night 2014 by Department of Computer Science, NTNU;

Enterprise file synch

Strange Clouds by michaelroper (cc) (from Flickr)
Strange Clouds by michaelroper (cc) (from Flickr)

Last fall at SNW in San Jose there were a few vendors touting enterprise file synchronization services each having a slightly different version of the requirements.   The one that comes most readily to mind was Egnyte which supported file synchronization across a hybrid cloud (public cloud and network storage) which we discussed in our Fall SNWUSA wrap up post last year.

The problem with BYOD

With bring your own devices (BYOD) corporate end users are quickly abandoning any pretense of IT control and turning consumer class file synchronization services to help  synch files across desktop, laptop and all mobile devices they haul around.   But the problem with these solutions such as DropBoxBoxOxygenCloud and others are that they are really outside of IT’s control.

Which is why there’s a real need today for enterprise class file synchronization solutions that exhibit the ease of use and set up available from consumer file synch systems but  offer IT security, compliance and control over the data that’s being moved into the cloud and across corporate and end user devices.

EMC Syncplicity and EMC on premises storage

Last week EMC announced an enterprise version of their recently acquired Syncplicity software that supports on-premises Isilon or Atmos storage, EMC’s own cloud storage offering.

In previous versions of Syncplicity storage was based in the cloud and used Amazon Web Services (AWS) for cloud orchestration and AWS S3 for cloud storage. With the latest release, EMC adds on premises storage to host user file synchronization services that can span mobile devices, laptops and end user desktops.

New Syncplicity users must download desktop client software to support file synchronization or mobile apps for mobile device synchronization.  After that it’s a simple matter of identifying which if any directories and/or files are to be synchronized with the cloud and/or shared with others.

However, with the Business (read enterprise) edition one also gets the Security and Compliance console which supports access control to define users and devices that can synchronize or share data, enforce data retention policies, remote wipe corporate data,  and native support for single sign services. In addition, one also gets centralized user and group management services to grant, change, revoke user and group access to data.  Also, one now obtains enterprise security with AES-256 data-at-rest encryption, separate key manager data centers and data storage data centers, quadruple replication of data for high disaster fault tolerance and SAS70 Type II compliant data centers.

If the client wants to use on premises storage, they would also need to deploy a VM virtual appliance somewhere in the data center to act as the gateway to file synchronization service requests. The file synch server would also presumably need access to the on premises storage and it’s unclear if the virtual appliance is in-band or out-of-band (see discussion on Egnyte’s solution options below).

Egnyte’s solution

Egnyte comes as a software only solution building a file server in the cloud for end user  storage. It also includes an Egnyte app for mobile hardware and the ever present web file browser.  Desktop file access is provided via mapped drives which access the Egnyte cloud file server gateway running as a virtual appliance.

One major difference between Syncplicity and Egnyte is that Egnyte offers a combination of both cloud and on premises storage but you cannot have just on premises storage. Syncplicity only offers one or the other storage for file data, i.e., file synchronization data can only be in the cloud or on local on premises storage but cannot be in both locations.

The other major difference is that Egnyte operates with just about anybody’s NAS storage such as EMC, IBM, and HDS for the on premises file storage.  It operates as an in-band, software appliance solution that traps file activity going to your on premises storage. In this case, one would need to start using a new location or directory for data to be synchronized or shared.

But for NetApp storage only (today), they utilize ONTAP APIs to offer out-of-band file synchronization solutions.  This means that you can keep NetApp data where it resides and just enable synchronization/shareability services for the NetApp file data in current directory locations.

Egnyte promises enterprise class data security with AD, LDAP and/or SSO user authentication, AES-256 data encryption and their own secure data centers.  No mention of separate key security in their literature.

As for cloud backend storage, Egnyte has it’s own public cloud or supports other cloud storage providers such as AWS S3, Microsoft Azure, NetApp Storage Grid and HP Public Cloud.

There’s more to Egnyte’s solution than just file synchronization and sharing but that’s the subject of today’s post. Perhaps we can cover them at more length in a future post if their interest.

File synchronization, cloud storage’s killer app?

The nice thing about these capabilities is that now IT staff can re-gain control over what is and isn’t synched and shared across multiple devices.  Up until now all this was happening outside the data center and external to IT control.

From Egnyte’s perspective, they are seeing more and more enterprises wanting data both on premises for performance and compliance as well as in the cloud storage for ubiquitous access.  They feel its both a sharability demand between an enterprise’s far flung team members and potentially client/customer personnel as well as a need to access, edit and propagate silo’d corporate information using new mobile devices that everyone has these days.

In any event, Enterprise file synchronization and sharing is emerging as one of the killer apps for cloud storage.  Up to this point cloud gateways made sense for SME backup or disaster recovery solutions but IMO, didn’t really take off beyond that space.  But if you can package a robust and secure file sharing and synchronization solution around cloud storage then you just might have something that enterprise customers are clamoring for.

~~~~

Comments?