大數據研究筆記

2017年12月3日星期日

First-use recommendations: HTTPS, Wi-Fi, NAT traversal etc…

Ref : https://www.linux-projects.org/rpi-videoconference-demo-os/

2017年11月13日星期一

tensor flow 名詞解釋

1.batchsize：每次訓練在訓練集中取batchsize個樣本訓練
2.iteration：1個iteration等於使用batchsize個樣本訓練一次
3.epoch：1个epoch等於使用訓練集中的全部樣本訓練一次
5000 個樣本在訓練集, batchsize 設100, 一個epoch會跑50次iteration.

2017年11月11日星期六

Ref : https://www.tensorflow.org/get_started/get_started

修改一下範例

x -> x0, x1....

從 1對1 變成 2 對 1,,,,,

import tensorflow as tf

# Model parameters
W0 = tf.Variable([.3], dtype=tf.float32)
W1 = tf.Variable([.3], dtype=tf.float32)
b = tf.Variable([-.3], dtype=tf.float32)
# Model input and output
x0 = tf.placeholder(tf.float32)
x1 = tf.placeholder(tf.float32)
linear_model = W0*x0 + W1*x1 + b
y = tf.placeholder(tf.float32)

# loss
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
# optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

# training data
x0_train = [1, 2, 3, 4]
x1_train = [1, 2, 3, 4]
y_train = [0, -1, -2, -3]
# training loop
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(1000):
sess.run(train, {x0: x0_train,x1: x1_train, y: y_train})

# evaluate training accuracy
curr_W0,curr_W1, curr_b, curr_loss = sess.run([W0,W1, b, loss], {x0: x0_train, x1 : x1_train, y: y_train})
print("W0: %s W1: %s b: %s loss: %s"%(curr_W0,curr_W1, curr_b, curr_loss))

print(sess.run(linear_model, {x0: 3.5, x1 :3.5}))

------------------------------------------------------------------------------------------------------------------------

def model_fn(features, labels, mode):
# Build a linear model and predict values
W0 = tf.get_variable("W0", [1], dtype=tf.float64)
W1 = tf.get_variable("W1", [1], dtype=tf.float64)
b = tf.get_variable("b", [1], dtype=tf.float64)
y = W0*features['x0'] + W1*features['x1'] + b
# Loss sub-graph
loss = tf.reduce_sum(tf.square(y - labels))
# Training sub-graph
global_step = tf.train.get_global_step()
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = tf.group(optimizer.minimize(loss),
tf.assign_add(global_step, 1))
# EstimatorSpec connects subgraphs we built to the
# appropriate functionality.
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=y,
loss=loss,
train_op=train)

estimator = tf.estimator.Estimator(model_fn=model_fn)

x0_train = np.array([1., 2., 3., 4.])
x1_train = np.array([1., 2., 3., 4.])
y_train = np.array([0., -1., -2., -3.])
x0_eval = np.array([2., 5., 8., 1.])
x1_eval = np.array([2., 5., 8., 1.])
y_eval = np.array([-1.01, -4.1, -7., 0.])
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x0": x0_train,"x1": x1_train}, y_train, batch_size=4, num_epochs=None, shuffle=True)
train_input_fn = tf.estimator.inputs.numpy_input_fn(
{"x0": x0_train,"x1": x1_train}, y_train, batch_size=4, num_epochs=1000, shuffle=False)
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
{"x0": x0_eval,"x1": x1_eval}, y_eval, batch_size=4, num_epochs=1000, shuffle=False)

# train
estimator.train(input_fn=input_fn, steps=1000)
# Here we evaluate how well our model did.
train_metrics = estimator.evaluate(input_fn=train_input_fn)
eval_metrics = estimator.evaluate(input_fn=eval_input_fn)
print("train metrics: %r"% train_metrics)
print("eval metrics: %r"% eval_metrics)

curr_W0,curr_W1, curr_b, curr_loss = sess.run([W0,W1, b, loss], {x0: x0_train, x1 : x1_train, y: y_train})

print("W0: %s W1: %s b: %s loss: %s"%(curr_W0,curr_W1, curr_b, curr_loss))

------------------------------------------------------------------------------------------

batch_ys

(100(列), 10(欄))
[[ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
 [ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  1.  0.]

2017年10月13日星期五

tensorflow regression I

這個例子是說...

我們先自己創造出一個資料群

y_true = (0.5* x_data)+5 + noise

斜率是 0.5... 常數是 5.. noise 是製造成一些分布...

接下來我們用tensorflow的方式來找出 M 和 B

m = tf.Variable(0.81)
b = tf.Variable(0.17)

定義m 和 b.. 用 tf.Variable

xph = tf.placeholder(tf.float32,[batch_size])
yph = tf.placeholder(tf.float32,[batch_size])

定義輸入的資料.. 用 tf.placeholder

y_model = m*xph + b

定義model

error = tf.reduce_sum(tf.square(yph-y_model))

定義 error

optimizr = tf.train.GradientDescentOptimizer(learning_rate = 0.001)

定義參數化的方式

train = optimizr.minimize(error)

定義train

init = tf.global_variables_initializer()

with tf.Session() as sess:
sess.run(init) -> 先初始化變數
batches = 1000
for i in range(batches):
rand_ind = np.random.randint(len(x_data),size=batch_size)
# 這邊是說不要全部的資料都 feed進去... 用亂數的方式取 1000個資料出來
feed = {xph:x_data[rand_ind],yph:y_true[rand_ind]}
sess.run(train,feed_dict=feed) -> run train
model_m,model_b = sess.run([m,b]) -> get model的值出來

最後值會在 model_m 和 model_b

model_m -> 0.48660713

model_b -> 4.8790665

y_hat = x_data * model_m + model_b

my_data.sample(n=250).plot(kind='scatter',x='X Data',y='Y')

plt.plot(x_data,y_hat,'r')

y_hat 就是那條紅色... 用眼睛看起來...這條線可以代表這群資料

程式碼如下:

# coding: utf-8

# In[56]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
get_ipython().magic('matplotlib inline')

# In[57]:

import tensorflow as tf

# In[58]:

x_data = np.linspace(0.0,10.0,1000000)

# In[59]:

noise = np.random.randn(len(x_data))

# In[60]:

x_data

# In[61]:

noise.shape

# In[62]:

# y = mx +b
# b = 5
y_true = (0.5* x_data)+5 + noise

# In[63]:

x_df = pd.DataFrame(data=x_data,columns=['X Data'])

# In[64]:

y_df = pd.DataFrame(data=y_true,columns=['Y'])

# In[65]:

y_df.head()

# In[66]:

my_data = pd.concat([x_df,y_df],axis=1)

# In[67]:

my_data.head()

# In[68]:

my_data.sample(n=250).plot(kind='scatter',x='X Data',y='Y')

# In[69]:

batch_size = 8

# In[70]:

np.random.randn(2)

# In[71]:

m = tf.Variable(0.81)

# In[72]:

b = tf.Variable(0.17)

# In[73]:

xph = tf.placeholder(tf.float32,[batch_size])

# In[74]:

yph = tf.placeholder(tf.float32,[batch_size])

# In[75]:

y_model = m*xph + b

# In[76]:

error = tf.reduce_sum(tf.square(yph-y_model))

# In[77]:

optimizr = tf.train.GradientDescentOptimizer(learning_rate = 0.001)

# In[78]:

train = optimizr.minimize(error)

# In[79]:

init = tf.global_variables_initializer()

# In[84]:

with tf.Session() as sess:
sess.run(init)
batches = 1000
for i in range(batches):
rand_ind = np.random.randint(len(x_data),size=batch_size)
feed = {xph:x_data[rand_ind],yph:y_true[rand_ind]}
sess.run(train,feed_dict=feed)
model_m,model_b = sess.run([m,b])

# In[85]:

model_m

# In[86]:

model_b

# In[88]:

y_hat = x_data *model_m + model_b

# In[92]:

my_data.sample(250).plot(kind='scatter',x='X Data',y='Y')
plt.plot(x_data,y_hat,'r')

# In[ ]:

2016年7月18日星期一

陳鍾誠教授的R語言

Ref : http://www.slideshare.net/ccckmit/r-29956322

R 語言的一些網路資料

Ref : R語言

存取dt 裡面的資料

dt$Science -> dt 裡面 Science 變數

dt[, 5] -> dt 裡面第五欄的資料

attach(dt) -> 可使dt裡面所有的資料傳到表層

dt[3,] -> 取得dt 裡面第三列的資料

dt[3(3,6),] -> 取得dt 裡面第三列和第六列的資料

subset(dt,Gender=="m") 取得 Gender 為 m 的資料

subset(dt,Science>=60) 取得 Science 大於等於60 的資料

讀取 excel 的檔案...使用 xlsx 的套件

排序資料 -> order() 和 sort()

描述性統計 :

length(變數) # 個數
mean (變數) # 平均數
sd(變數) # 標準差
quantile(變數) # 百分位數

例子:

mean(dt$Science) -> 70.77778

sd(dt$Literature) -> 19.7428

分組之描述性統計

tapply(變數, 分組因子, 運算函數,..)

tapply(dt$Science, dt$Gender, mean)

f m
64.40 78.75

或是用 subset 切出子集合

mean(subset(dt,Gender=="m")$Science)

mean(subset(dt,Gender=="f")$Science)

如何安裝 R (3.2.0 for Windows)

Ref : http://www.largitdata.com/course/33/

2017年12月3日 星期日

First-use recommendations: HTTPS, Wi-Fi, NAT traversal etc…

2017年11月13日 星期一