Azure Storage Python just released v1.0! Let’s see how we can use the package to create a health data simulator that publishes to Azure Blob.
First, you need to have minimum version of Python 2.7, 3.3, 3.4, or 3.5. I installed the latest version using the following command (Python 3.5 and PIP 3):
pip install azure-storage
Scenario
We are going to simulate a multivariate time-point sequence of a patient data. Something similar to this for a patient at times T1 to T3:
The problem with this type of data is that if we are not receiving this data in real-time or on a regular basis, it is hard to consider this as a time-series. If T1, T2 and T3 are in different points in time, e.g., one at Jan 30th, the second on February 15th and the last one on February 20th, it is better to create an abstraction instead. We call this abstraction time-intervals. Time intervals look something like this:
This abstraction is modelled using the following class:
#Create an abstraction
class TemporalAbstraction:
def __init__(self, f, v, s, e, **kwargs):
self.f = f
self.v = v
self.s = s
self.e = e
self.diagnosis = 0
@property
def Size(self):
return self.e - self.s
def __repr__(self, **kwargs):
return "({0}, {1}, {2}, {3})".format(self.f,self.v,self.s,self.e)
Finally, we create a matrix of abstraction using a sliding window of size w.
This will make each sample in form of no_of_variables X no_of_abstractions X no_of_windows. In our case, having five variables with five level of abstractions and 25 windows, we will have 5x5x25=625 values for each sample plus a value for class. Now we want to save the result in a file in a blob storage.
Azure Blob SDK
You can follow the tutorial on how to use the SDK for your specific scenario. In our case, we want to append the samples to a blob storage as we assume the file is run in an environment with no storage.
At this point, a file is created with the options defined in the simulator to distinguish the file with other files:
with open("db-s{s}aps{aps}inj{inj[0]}-{inj[1]}w{w}v{v}rand{r}drange{rng}.csv".format(v=no_of_variables,s=no_of_samples,w=window_sizes[w],aps=no_of_abstractions_per_sample,
inj=injection_percentagePerc,r=random,rng=data_range),"w") as f:
for x in sample_windows:
#one sample
sample_row = ""
for window in sorted(x[0]):
a = np.array(x[0][window].reshape((1,no_of_variables * len(value_abstractions)))[0])
a = a / window_sizes[w]
sample_row+=str(a.tolist()).strip('[').strip(']') + ","
a=f.write(str(x[1]) + "," + sample_row.strip(',') + "\n")
A file will look something like:
db-s1000aps50inj0.5-0.1w400v5randFalsedrange10000.csv
which includes all the parameter of the simulation.
Now to use Azure Blobs using appending, we need to add this code:
To create a container:
from azure.storage.blob import AppendBlobService
Then you need to create the blob using your accountName and accountKey:
append_blob_service = AppendBlobService(account_name='accountName', account_key='accountKey')
Then we create our container:
append_blob_service.create_container('simulated_data')
At this point we create the file, and at each line, add the data. The final code snippet will look like this:
fileName="db-s{s}aps{aps}inj{inj[0]}-{inj[1]}w{w}v{v}rand{r}drange{rng}.csv".format(v=no_of_variables,s=no_of_samples,w=window_sizes[w],aps=no_of_abstractions_per_sample,
inj=injection_percentagePerc,r=random,rng=data_range)
append_blob_service = AppendBlobService(account_name='accountName', account_key='accountKey')
append_blob_service.create_container('simulated_data')
append_blob_service.create_blob('simulated_data', fileName)
for x in sample_windows:
#one sample
sample_row = ""
for window in sorted(x[0]):
a = np.array(x[0][window].reshape((1,no_of_variables * len(value_abstractions)))[0])
a = a / window_sizes[w]
sample_row+=str(a.tolist()).strip('[').strip(']') + ","
append_blob_service.append_blob_from_text('simulated_data', fileName, str(x[1]) + "," + sample_row.strip(',') + "\n")
Now each time the file is run, the new file will be created on blob storage and will be accessible from several services.
The code is available in Microsoft Azure Notebooks at the following link for you to browse:
https://notebooks.azure.com/manashty/libraries/healthsimulator