2017年11月13日 星期一

tensor flow 名詞解釋



1.batchsize:每次訓練在訓練集中取batchsize個樣本訓練
2.iteration:1個iteration等於使用batchsize個樣本訓練一次
3.epoch:1个epoch等於使用訓練集中的全部樣本訓練一次

5000 個樣本在訓練集, batchsize 設100, 一個epoch會跑50次iteration.







2017年11月11日 星期六

tensor flow 1

Ref : https://www.tensorflow.org/get_started/get_started


修改一下範例


x ->  x0,  x1....

從  1對1 變成  2 對 1,,,,,



import tensorflow as tf

# Model parameters
W0 = tf.Variable([.3], dtype=tf.float32)
W1 = tf.Variable([.3], dtype=tf.float32)
b = tf.Variable([-.3], dtype=tf.float32)
# Model input and output
x0 = tf.placeholder(tf.float32)
x1 = tf.placeholder(tf.float32)
linear_model = W0*x0 + W1*x1 + b
y = tf.placeholder(tf.float32)

# loss
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
# optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

# training data
x0_train = [1, 2, 3, 4]
x1_train = [1, 2, 3, 4]
y_train  = [0, -1, -2, -3]
# training loop
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(1000):
  sess.run(train, {x0: x0_train,x1: x1_train, y: y_train})

# evaluate training accuracy
curr_W0,curr_W1, curr_b, curr_loss = sess.run([W0,W1, b, loss], {x0: x0_train, x1 : x1_train, y: y_train})
print("W0: %s W1: %s b: %s loss: %s"%(curr_W0,curr_W1, curr_b, curr_loss))

print(sess.run(linear_model, {x0: 3.5, x1 :3.5}))


------------------------------------------------------------------------------------------------------------------------


def model_fn(features, labels, mode):
  # Build a linear model and predict values
  W0 = tf.get_variable("W0", [1], dtype=tf.float64)
  W1 = tf.get_variable("W1", [1], dtype=tf.float64)
  b = tf.get_variable("b", [1], dtype=tf.float64)
  y = W0*features['x0'] + W1*features['x1']  + b
  # Loss sub-graph
  loss = tf.reduce_sum(tf.square(y - labels))
  # Training sub-graph
  global_step = tf.train.get_global_step()
  optimizer = tf.train.GradientDescentOptimizer(0.01)
  train = tf.group(optimizer.minimize(loss),
                   tf.assign_add(global_step, 1))
  # EstimatorSpec connects subgraphs we built to the
  # appropriate functionality.
  return tf.estimator.EstimatorSpec(
      mode=mode,
      predictions=y,
      loss=loss,
      train_op=train)

estimator = tf.estimator.Estimator(model_fn=model_fn)

x0_train = np.array([1., 2., 3., 4.])
x1_train = np.array([1., 2., 3., 4.])
y_train = np.array([0., -1., -2., -3.])
x0_eval = np.array([2., 5., 8., 1.])
x1_eval = np.array([2., 5., 8., 1.])
y_eval = np.array([-1.01, -4.1, -7., 0.])
input_fn = tf.estimator.inputs.numpy_input_fn(
    {"x0": x0_train,"x1": x1_train}, y_train, batch_size=4, num_epochs=None, shuffle=True)
train_input_fn = tf.estimator.inputs.numpy_input_fn(
     {"x0": x0_train,"x1": x1_train}, y_train, batch_size=4, num_epochs=1000, shuffle=False)
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
     {"x0": x0_eval,"x1": x1_eval}, y_eval, batch_size=4, num_epochs=1000, shuffle=False)

# train
estimator.train(input_fn=input_fn, steps=1000)
# Here we evaluate how well our model did.
train_metrics = estimator.evaluate(input_fn=train_input_fn)
eval_metrics = estimator.evaluate(input_fn=eval_input_fn)
print("train metrics: %r"% train_metrics)
print("eval metrics: %r"% eval_metrics)

curr_W0,curr_W1, curr_b, curr_loss = sess.run([W0,W1, b, loss], {x0: x0_train, x1 : x1_train, y: y_train})

print("W0: %s W1: %s b: %s loss: %s"%(curr_W0,curr_W1, curr_b, curr_loss))

------------------------------------------------------------------------------------------

batch_ys

(100(列), 10(欄))
[[ 0.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]
 [ 1.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  1.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  1.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  1.  0.]






2017年10月13日 星期五

tensorflow regression I



這個例子是說...

我們先自己創造出一個資料群


y_true = (0.5* x_data)+5 + noise

斜率是 0.5...  常數是  5..  noise 是製造成一些分布...

接下來我們用tensorflow的方式來找出  M  和  B



m = tf.Variable(0.81)
b = tf.Variable(0.17)

定義m 和 b..  用 tf.Variable

xph = tf.placeholder(tf.float32,[batch_size])
yph = tf.placeholder(tf.float32,[batch_size])

定義輸入的資料.. 用 tf.placeholder

y_model = m*xph + b

定義model

error = tf.reduce_sum(tf.square(yph-y_model))

定義 error

optimizr = tf.train.GradientDescentOptimizer(learning_rate = 0.001)

定義參數化的方式

train = optimizr.minimize(error)

定義train

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)    ->  先初始化變數
    batches = 1000 
    for i in range(batches):
        rand_ind = np.random.randint(len(x_data),size=batch_size)
        # 這邊是說不要全部的資料都 feed進去...  用亂數的方式取 1000個資料出來
        feed = {xph:x_data[rand_ind],yph:y_true[rand_ind]}
        sess.run(train,feed_dict=feed)   ->  run  train
    model_m,model_b = sess.run([m,b])   ->    get model的值出來


最後值會在  model_m 和 model_b

model_m -> 0.48660713

model_b -> 4.8790665

y_hat = x_data * model_m + model_b
my_data.sample(n=250).plot(kind='scatter',x='X Data',y='Y')
plt.plot(x_data,y_hat,'r')

y_hat 就是那條紅色...   用眼睛看起來...這條線可以代表這群資料







程式碼如下:


# coding: utf-8

# In[56]:


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
get_ipython().magic('matplotlib inline')


# In[57]:


import tensorflow as tf


# In[58]:


x_data = np.linspace(0.0,10.0,1000000)


# In[59]:


noise = np.random.randn(len(x_data))


# In[60]:


x_data


# In[61]:


noise.shape


# In[62]:


# y = mx +b
# b = 5
y_true = (0.5* x_data)+5 + noise


# In[63]:


x_df = pd.DataFrame(data=x_data,columns=['X Data'])


# In[64]:


y_df = pd.DataFrame(data=y_true,columns=['Y'])


# In[65]:


y_df.head()


# In[66]:


my_data = pd.concat([x_df,y_df],axis=1)


# In[67]:


my_data.head()


# In[68]:


my_data.sample(n=250).plot(kind='scatter',x='X Data',y='Y')


# In[69]:


batch_size = 8


# In[70]:


np.random.randn(2)


# In[71]:


m = tf.Variable(0.81)


# In[72]:


b = tf.Variable(0.17)


# In[73]:


xph = tf.placeholder(tf.float32,[batch_size])


# In[74]:


yph = tf.placeholder(tf.float32,[batch_size])


# In[75]:


y_model = m*xph + b


# In[76]:


error = tf.reduce_sum(tf.square(yph-y_model))


# In[77]:


optimizr = tf.train.GradientDescentOptimizer(learning_rate = 0.001)


# In[78]:


train = optimizr.minimize(error)


# In[79]:


init = tf.global_variables_initializer()


# In[84]:


with tf.Session() as sess:
    sess.run(init)
    batches = 1000
    for i in range(batches):
        rand_ind = np.random.randint(len(x_data),size=batch_size)
        feed = {xph:x_data[rand_ind],yph:y_true[rand_ind]}
        sess.run(train,feed_dict=feed)
    model_m,model_b = sess.run([m,b])


# In[85]:


model_m


# In[86]:


model_b


# In[88]:


y_hat = x_data *model_m + model_b


# In[92]:


my_data.sample(250).plot(kind='scatter',x='X Data',y='Y')
plt.plot(x_data,y_hat,'r')


# In[ ]:




2016年7月18日 星期一

陳鍾誠教授的R語言

Ref : http://www.slideshare.net/ccckmit/r-29956322

R 語言的一些網路資料

Ref : R語言


存取dt 裡面的資料


dt$Science  ->  dt 裡面 Science 變數

dt[, 5]         -> dt 裡面第五欄的資料

attach(dt)     -> 可使dt裡面所有的資料傳到表層

dt[3,]         ->  取得dt 裡面第三列的資料

dt[3(3,6),]   ->   取得dt 裡面第三列和第六列的資料

subset(dt,Gender=="m") 取得 Gender 為 m 的資料

subset(dt,Science>=60) 取得 Science 大於等於60 的資料


讀取 excel 的檔案...使用  xlsx 的套件

排序資料  ->  order() 和 sort()

描述性統計 :  

 length(變數)    # 個數
 mean (變數)   #  平均數
sd(變數)         #  標準差
quantile(變數) # 百分位數

例子:

     mean(dt$Science)  -> 70.77778

     sd(dt$Literature)   -> 19.7428


分組之描述性統計

tapply(變數, 分組因子, 運算函數,..)

tapply(dt$Science, dt$Gender, mean)

  f           m
64.40    78.75

或是用 subset 切出子集合

mean(subset(dt,Gender=="m")$Science)

mean(subset(dt,Gender=="f")$Science)






如何安裝 R (3.2.0 for Windows)

Ref :  http://www.largitdata.com/course/33/