A Concise Temporal Data Representation Model for Prediction in Biomedical Wearable Devices

You now have early access to our latest paper in IEEE Internet of Things Journal (IF 5.8): “A Concise Temporal Data Representation Model for Prediction in Biomedical Wearable Devices”

Digest:  A novel time mapping model is proposed which is capable of mapping billions of data elements with sampling rate of several kHz or higher into a sequence of 32 elements or fewer. Life Model for time-series (LMts) is used to predict and forecast human fall even with 50% missing values.

Read the abstract and full article at: https://lnkd.in/dtzTdq9

I would love to hear your comments and answer any questions you may have.  

Github code will be available soon!

A cloud-based mobile human fall forecasting system using recurrent neural networks

Congratulations Mehrgan on your Honour’s Project presentation. Mehrgan Khoshpasand Foumani was working on a cloud-based system towards a mobile application that can forecast human fall using LSTM deep neural networks.

You may read his thesis and download his final presentation here.

Honors_Thesis

Presentation

Mehrgan is a B.Sc. student in computer science at University of New Brunswick whose honours project I supervised.

Virtual Machines in Azure Marketplace Review Series – Knowledge Studio

Azure Market place is an ecosystem of may pre-built products, especially the ones that include interesting virtual machines. Some include popular Microsoft products, such as Windows Server, SQL Server, and Sharepoint Server. The number of third-party virtual machines that are deployed on Azure Marketplace is growing each day. Today, I decided to try one of them called Knowledge Studio which seems to have something related with data science as well. It is actually Angoss Knowledge Studio 9.6 software installed and available in a virtual machine.

In this video, I create and review this virtual machine.

IoT Proof of Concepts: Azure IoT Suites

Azure has many interesting services, many of which are conceptually new products designed to solve specific problems. Azure IoT Suites are a very good explanation and proof of concept for Azure Event Hubs and IoT Hubs. I think it is worth it to review them again and see what are the building blocks of an Internet of Things solutions and predictive analysis for IoT.

This video reviews both Predictive Maintenance and Remote Monitoring solutions on Azure IoT Suite.

I was interested to see whether we are expecting any updates on these solutions. I have checked their Github activity graph and it turns out that the projects haven’t been active for a long time. However, as the new version of visual studio (2017) launch date was approaching, many bug fixes and new solutions are provided to maintain compatibility.

Thus, I think they served their purposes both at beginning and now and Microsoft is now moving to showcase their new services and products.

Azure Storage with Python – Health Data Simulator

Azure Storage Python just released v1.0! Let’s see how we can use the package to create a health data simulator that publishes to Azure Blob.

First, you need to have minimum version of Python 2.7, 3.3, 3.4, or 3.5. I installed the latest version using the following command (Python 3.5 and PIP 3):

pip install azure-storage

Scenario

We are going to simulate a multivariate time-point sequence of a patient data. Something similar to this for a patient at times T1 to T3:

The problem with this type of data is that if we are not receiving this data in real-time or on a regular basis, it is hard to consider this as a time-series. If T1, T2 and T3 are in different points in time, e.g., one at Jan 30th, the second on February 15th and the last one on February 20th, it is better to create an abstraction instead. We call this abstraction time-intervals. Time intervals look something like this:

This abstraction is modelled using the following class:

#Create an abstraction
class TemporalAbstraction:
    def __init__(self, f, v, s, e, **kwargs):
        self.f = f
        self.v = v
        self.s = s
        self.e = e
        self.diagnosis = 0
    @property
    def Size(self):
        return self.e - self.s

    def __repr__(self, **kwargs):
        return "({0}, {1}, {2}, {3})".format(self.f,self.v,self.s,self.e)

Finally, we create a matrix of abstraction using a sliding window of size w.

This will make each sample in form of no_of_variables X no_of_abstractions X no_of_windows. In our case, having five variables with five level of abstractions and 25 windows, we will have 5x5x25=625 values for each sample plus a value for class. Now we want to save the result in a file in a blob storage.

Azure Blob SDK

You can follow the tutorial on how to use the SDK for your specific scenario. In our case, we want to append the samples to a blob storage as we assume the file is run in an environment with no storage.

At this point, a file is created with the options defined in the simulator to distinguish the file with other files:

with open("db-s{s}aps{aps}inj{inj[0]}-{inj[1]}w{w}v{v}rand{r}drange{rng}.csv".format(v=no_of_variables,s=no_of_samples,w=window_sizes[w],aps=no_of_abstractions_per_sample,
        inj=injection_percentagePerc,r=random,rng=data_range),"w") as f:
    for x in sample_windows:
        #one sample
        sample_row = ""
        for window in sorted(x[0]):
            a = np.array(x[0][window].reshape((1,no_of_variables * len(value_abstractions)))[0])
            a = a / window_sizes[w]
            sample_row+=str(a.tolist()).strip('[').strip(']') + ","
        a=f.write(str(x[1]) + "," + sample_row.strip(',') + "\n")

A file will look something like:
db-s1000aps50inj0.5-0.1w400v5randFalsedrange10000.csv

which includes all the parameter of the simulation.

Now to use Azure Blobs using appending, we need to add this code:

To create a container:

from azure.storage.blob import AppendBlobService

Then you need to create the blob using your accountName and accountKey:

append_blob_service = AppendBlobService(account_name='accountName', account_key='accountKey')

Then we create our container:

append_blob_service.create_container('simulated_data')

At this point we create the file, and at each line, add the data. The final code snippet will look like this:

fileName="db-s{s}aps{aps}inj{inj[0]}-{inj[1]}w{w}v{v}rand{r}drange{rng}.csv".format(v=no_of_variables,s=no_of_samples,w=window_sizes[w],aps=no_of_abstractions_per_sample,
            inj=injection_percentagePerc,r=random,rng=data_range)

    append_blob_service = AppendBlobService(account_name='accountName', account_key='accountKey')
    append_blob_service.create_container('simulated_data')
    append_blob_service.create_blob('simulated_data', fileName)

    for x in sample_windows:
            #one sample
            sample_row = ""
            for window in sorted(x[0]):
                a = np.array(x[0][window].reshape((1,no_of_variables * len(value_abstractions)))[0])
                a = a / window_sizes[w]
                sample_row+=str(a.tolist()).strip('[').strip(']') + ","
            append_blob_service.append_blob_from_text('simulated_data', fileName, str(x[1]) + "," + sample_row.strip(',') + "\n")

Now each time the file is run, the new file will be created on blob storage and will be accessible from several services.

The code is available in Microsoft Azure Notebooks at the following link for you to browse:

https://notebooks.azure.com/manashty/libraries/healthsimulator

Looking for volunteers – Azure Global Bootcamp Event at UNB

UNB Cloud club is planning to host an Azure Global Bootcamp event on April 22, 2016 along with many other locations around the world.

More details to follow.

It will be both fun and a learning experience. To organize this event, we are looking volunteers to be helping with the following:

  1. Finding local speakers
  2. Contacting potential sponsors
  3. Helping with labs during the event day.

Students who are interested, please contact me by email at manashty@outlook.com or reply to this post.

The deadline for initial participation is March 25th, 2017.

Sincerely,

Cloud Club

Book Chapter published: Cloud Platforms for IoE Healthcare Context Awareness and Knowledge Sharing

Cloud Platforms for IoE Healthcare Context Awareness and Knowledge Sharing is the title of my new book chapter that is finally published on January 2017 by Springer. This chapter is part of the book “Beyond the Internet of Things: Everything Interconnected“.

In this book chapter I talked about the many platforms that are designed and can allow knowledge sharing between help monitoring systems. With large databases like MIMIC III available these days, the knowledge extracted from them can be used to give real-time feedback for patients that are currently in ICU.

I mentioned Azure and Azue IoT suits as an ideal PaaS candidate for managing such a service in my book chapter. One of the goals of developing HEAL is that one day Health Monitoring be added to the default solution type samples on Microsoft Azure IoT Suite. Combined with Microsoft Health Vault, this can become a reality once solid policies regarding privacy and ethics promotes the analysis of a patient record only and only for his/her own well-being, and never for insurance and employment decisions.

Azure N-Series Virtual Machines – Deep Learning toolkit for the DSVM Review

If you haven’t had a chance to check the Azure Data Science Virtual Machine, you can see why you should probably check it out.

Azure N-Series are already superb virtual machines (see my previous post), specially for machine learning and video rendering. However, it might take some time to install all the software needed to run your algorithm in one of those machines. For me, it took around an hour from starting the VM to starting a full version of Visual Studio. Well, we are not as patient these days anymore. Surprisingly, there is a very good solution to that problem: Data Science Virtual Machines (DSVM). When creating a new machine, instead of creating a new N-Series (NC or NV), you can instead search for Data Science virtual machine and create your desired NC-Series virtual machine using that virtual machine. Nevertheless, there is even a more special one: Deep Learning toolkit for the DSVM. See the below to see how to find that.

This machine has the latest version of everything you need installed! When the virtual machine is ready, and you connect to it, you will face a desktop full of a variety of useful machine learning and data analysis tools! Not only the base requirements are installed, e.g, Visual Studio 2015, Python, Anaconda, Jupyter, but also many useful framework and tools are also installed. Microsoft R Studio, Microsoft Cognitive Toolkit (CNTK), and Tensorflow are only a few examples. The only thing you need to do is to run your code in that machine!

The advantage of this machine is not only because of the software and toolkits installed on it, but because it creates a N-Series Virtual Machine (See my previous post). You will create this machine on a NC-Class series GPU, which gives you up to 24 cores and 4 Tesla K80 GPUs with as much as 224GB of RAM.

Your Deep Learning dreams can come true with this Virtual Machine available on Azure. You can create your own customized solution and have it backed by Azure. The VM is not yet available in Microsoft Canadian’s data centers, however, you can create one in US while accessing it from Canada.

Deep learning toolkit

With this machine, comes the deep learning readme which is a tutorial on how to start different samples using different tools (e.g, CNTK (or Microsoft Cognitive Toolkit), and Tensorflow). Using a command prompt, you can just copy/paste the sample codes to get the samples running and see how fast it works.

Monitoring GPU Usage

Deep learning without GPUs is like a single-lane highway. GPUs speed up the deep learning training by several times. As there are usually two different SDKs available for CPU or GPU version (Tensorflow for example), it is always good to be able to monitor GPU vs CPU activity to see how your program is utilizing the resources. Generally, I think Task Manager is enough for monitoring the CPU usage. However, for GPU usage and specially its memory usage, you can use “nvidia-smi.exe” in your machine with Nvidia GPU. Nvidia drivers and tools are fully installed in Deep learning toolkit DSVMs, so the only thing you need to is to run the command!

As you can see, the GPU memory usage and GPU usage is available per each Tesla K80 GPUs. The MNST sample was trained fast with only a small amount of GPU usage (3%). I tried other bigger examples using Tensorflow. and one of the GPU cores was being utilized up to around 90%.

Conclusion

All in all, Deep learning toolkit for data science virtual machine adds a lot of value to the already unique Azure N-Series NC-class GPU-Enabled virtual machines. It takes at least a few hours to set up the virtual machine the way it is right now. All the tools installed are a required necessity for most deep learning tasks. The virtual machines with those powerful GPUs already meant a huge time-saver. Now you can start training your network in this powerful machine in just a few minutes! And cancel your vacation plans, it doesn’t take months to train!

Azure N-Series GPU VMs Performance Test: Are they worth it? YES

Azure N-Series are very powerful virtual machines with multiple NVidia Tesla GPUs. Microsoft announced Azure N-Series virtual machines availability on December 2016. There are two series: NC and NV. NC series contain NVIDIA Tesla K80 GPUs, which are a little bit older than the newest Tesla M60 series which are designed to work in data centers. More on the differences here.

These machines are not cheap. The middle NV series with 2xM60 GPUs, 12 Xeon cores and 112GB of RAM costs around USD $2.6/hour. This means you have to pay around $64 a day or around $1872 per month. So the question is, should you invest in such machines? I have conducted a test.

You can see the two M60 GPUs along with 12 Xeon CPUs in Device Manager:

I compared the performance of a NV12 VM (the middle one above) with my Surface Pro 4, i7, 16GB! I know, this was a CPU comparison only, but I also tested the GPU power of these machines. To test them, I installed Tensorflow 0.12.0rc which is the only bug-free Windows-compatible version of Tensorflow that I could find. I used Visual Studio 2015 to run the Python codes. From my experience, the new version 1.0.0 still has some bugs and errors. Actually I tested version 1.0 first, however, the GPU version was really slow. Then I installed the 0.12.0rc version using the commands below for CPU and GPU versions respectively (with Python 3.5.3 installed):

Tensorflow for Windows CPU version (working):

pip3 install -upgrade https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow-0.12.0rc0-cp35-cp35m-win_amd64.whl

Tensorflow for Windows GPU version (working):

pip3 install –-upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-0.12.0rc0-cp35-cp35m-win_amd64.whl

CPU only test

In the first test, I compared the performance of the NV12 Machine with my Surface Pro 4 (i7). SP4 i7 has 2 cores and 4 hardware threads per core. The results showed that the NV12 machine is around 10x faster than SP4 when training MNIST dataset using Tensorflow. Watch the head-to-head comparison in the video below:

GPU test

Azure N-Series are not build because of their CPU power. The main advantage of these machines is the Cuda-enabled Nvidia Tesla GPUs. GPUs are better at matrix multiplication, something that deep neural nets are all about. Using the GPU power for computation will enable tasks to be done in days compared to weeks and months. So let’s see how these GPUs tackle with this DNN training task:

As you could see, the running the MNIST training on GPU-enabled Tensorflow was at least 4x faster than the CPU version. As I am not sure whether both GPUs were enabled during this task, this can be considered as a notable improvement. For example if I was going to spend 5 business days for training a DNN, I could now do it in a day! This makes a huge difference in the work flow of training and testing.

So the short answer is Yes! Compared to my Surface Pro 4, it is around 30x faster! That is an 3000% speed increase from a decent Core i7 CPU. Thus using the new N-Series VMs not only saves you a lot in terms of time and expenses, but also you can definitely focus on problem solving and solution designing vs idly waiting! The cheapest idle data scientist would cost you around $5000 per month. With 30x speed increase, he/she can do the task in less than 2 weeks vs a full year (less vacation). Also, as you pay only hourly, you can do simple tasks by running the machine for just a day and don’t end up paying for the rest of the month. This will save you a lot as well versus actually buying a machine and maintaining it. And you don’t need to worry about power outage in the building during that one long month of training!