tf.estimator快速入门

原作者: [db:作者] 来自: [db:来源] 收藏邀请

TensorFlow的high-level机器学习API(tf.estimator)可以轻松配置、训练和评估各种机器学习模型。在本教程中，您将使用tf.estimator构造一个神经网络分类器，在iris数据集上进行训练并根据萼片/花瓣几何学参数预测花的种类。您将编写代码来执行以下五个步骤：

将包含iris训练/测试数据的CSV加载到TensorFlow中的Dataset
构建一个神经网络分类器
使用训练数据训练模型
评估模型的准确性
分类新样品

注意：在开始本教程之前，请在你的机器上安装TensorFlow。

完整的神经网络源代码

以下是神经网络分类器的完整代码：

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
from six.moves.urllib.request import urlopen

import numpy as np
import tensorflow as tf

# Data sets
IRIS_TRAINING = "iris_training.csv"
IRIS_TRAINING_URL = "http://download.tensorflow.org/data/iris_training.csv"

IRIS_TEST = "iris_test.csv"
IRIS_TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"

def main():
  # If the training and test sets aren't stored locally, download them.
  if not os.path.exists(IRIS_TRAINING):
    raw = urlopen(IRIS_TRAINING_URL).read()
    with open(IRIS_TRAINING, "wb") as f:
      f.write(raw)

  if not os.path.exists(IRIS_TEST):
    raw = urlopen(IRIS_TEST_URL).read()
    with open(IRIS_TEST, "wb") as f:
      f.write(raw)

  # Load datasets.
  training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
      filename=IRIS_TRAINING,
      target_dtype=np.int,
      features_dtype=np.float32)
  test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
      filename=IRIS_TEST,
      target_dtype=np.int,
      features_dtype=np.float32)

  # Specify that all features have real-value data
  feature_columns = [tf.feature_column.numeric_column("x", shape=[4])]

  # Build 3 layer DNN with 10, 20, 10 units respectively.
  classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
                                          hidden_units=[10, 20, 10],
                                          n_classes=3,
                                          model_dir="/tmp/iris_model")
  # Define the training inputs
  train_input_fn = tf.estimator.inputs.numpy_input_fn(
      x={"x": np.array(training_set.data)},
      y=np.array(training_set.target),
      num_epochs=None,
      shuffle=True)

  # Train model.
  classifier.train(input_fn=train_input_fn, steps=2000)

  # Define the test inputs
  test_input_fn = tf.estimator.inputs.numpy_input_fn(
      x={"x": np.array(test_set.data)},
      y=np.array(test_set.target),
      num_epochs=1,
      shuffle=False)

  # Evaluate accuracy.
  accuracy_score = classifier.evaluate(input_fn=test_input_fn)["accuracy"]

  print("\nTest Accuracy: {0:f}\n".format(accuracy_score))

  # Classify two new flower samples.
  new_samples = np.array(
      [[6.4, 3.2, 4.5, 1.5],
       [5.8, 3.1, 5.0, 1.7]], dtype=np.float32)
  predict_input_fn = tf.estimator.inputs.numpy_input_fn(
      x={"x": new_samples},
      num_epochs=1,
      shuffle=False)

  predictions = list(classifier.predict(input_fn=predict_input_fn))
  predicted_classes = [p["classes"] for p in predictions]

  print(
      "New Samples, Class Predictions:    {}\n"
      .format(predicted_classes))

if __name__ == "__main__":
    main()

以下部分详细介绍了代码。

将Iris CSV数据加载到TensorFlow

该iris数据集包含150行数据，包括来自三个相关iris种类的每一类50个样本：iris setosa，iris virginica，和iris versicolor。

从左到右，iris setosa(通过Radomil，CC BY-SA 3.0)，iris versicolor(通过Dlanglois，CC BY-SA 3.0)和iris virginica(通过弗兰克梅菲尔德，CC BY-SA 2.0)。

每行包含每个花样的以下数据：萼片长度，萼片宽度，花瓣长度，花瓣宽度和花卉种类。花种以整数表示，0表示iris setosa，1表示iris versicolor，2表示iris virginica。

萼片长度	萼片宽度	花瓣长度	花瓣宽度	种类
5.1	3.5	1.4	0.2	0
4.9	3.0	1.4	0.2	0
4.7	3.2	1.3	0.2	0
…	…	…	…	…
7	3.2	4.7	1.4	1
6.4	3.2	4.5	1.5	1
6.9	3.1	4.9	1.5	1
…	…	…	…	…
6.5	3.0	5.2	2.0	2
6.2	3.4	5.4	2.3	2
5.9	3.0	5.1	1.8	2

对于本教程，iris数据已被随机分成两个独立的CSV：

A training set of 120 samples
(iris_training.csv)
A test set of 30 samples
(iris_test.csv).

要开始，首先导入所有必要的模块，并定义下载和存储数据集的位置：

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
from six.moves.urllib.request import urlopen

import tensorflow as tf
import numpy as np

IRIS_TRAINING = "iris_training.csv"
IRIS_TRAINING_URL = "http://download.tensorflow.org/data/iris_training.csv"

IRIS_TEST = "iris_test.csv"
IRIS_TEST_URL = "http://download.tensorflow.org/data/iris_test.csv"

然后，如果训练和测试集尚未存储在本地，则下载它们。

if not os.path.exists(IRIS_TRAINING):
  raw = urlopen(IRIS_TRAINING_URL).read()
  with open(IRIS_TRAINING,'wb') as f:
    f.write(raw)

if not os.path.exists(IRIS_TEST):
  raw = urlopen(IRIS_TEST_URL).read()
  with open(IRIS_TEST,'wb') as f:
    f.write(raw)

接下来，将训练和测试集加载到Dataset，使用load_csv_with_header()方法learn.datasets.base。load_csv_with_header()方法需要三个必需的参数：

filename，CSV文件路径
target_dtype，这需要numpy数据类型的数据集的目标值。
features_dtype，这需要numpy数据类型数据集的特征值。

在这里，目标(你正在训练模型来预测的值)是花的种类，它是一个0-2的整数，所以适当的numpy数据类型是np.int：

# Load datasets.
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=IRIS_TRAINING,
    target_dtype=np.int,
    features_dtype=np.float32)
test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=IRIS_TEST,
    target_dtype=np.int,
    features_dtype=np.float32)

Dataset在tf.contrib.learn中，是命名元组;您可以通过data和target字段访问功能数据和目标值。这里，training_set.data和training_set.target分别包含训练集的特征数据和目标值test_set.data和test_set.target包含测试集的特征数据和目标值。

之后，在“用DNNClassifier拟合iris训练数据”你会用training_set.data和training_set.target训练你的模型，用test_set.data和test_set.target“评估模型的准确性”。

构建深度神经网络分类器

tf.estimator提供了各种预定义的模型，称为Estimators，您可以使用“开箱即用”对数据进行训练和评估操作。在这里，您将配置深度神经网络分类器模型以拟合Iris数据。使用tf.estimator，你可以实例化你的tf.estimator.DNNClassifier只需要几行代码：

# Specify that all features have real-value data
feature_columns = [tf.feature_column.numeric_column("x", shape=[4])]

# Build 3 layer DNN with 10, 20, 10 units respectively.
classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
                                        hidden_units=[10, 20, 10],
                                        n_classes=3,
                                        model_dir="/tmp/iris_model")

上面的代码首先定义模型的特征列，它指定数据集中特征的数据类型。所有的特征数据都是连续的，tf.feature_column.numeric_column是用于构造特征列的适当函数。数据集中有四个特征(萼片宽度，萼片高度，花瓣宽度和花瓣高度)，因此shape必须设置为[4]保存所有数据。

然后，代码创建一个DNNClassifier模型，使用以下参数：

feature_columns=feature_columns。特征列。
hidden_units=[10, 20, 10]。三隐藏的图层，分别含有10,20和10个神经元。
n_classes=3。三个目标类，代表三个iris种类。
model_dir=/tmp/iris_model。 TensorFlow将在模型训练期间保存检查点数据和TensorBoard摘要的目录。

描述训练输入的pipeline

该tf.estimatorAPI使用输入函数，这些函数创建生成模型数据的TensorFlow操作。我们可以用tf.estimator.inputs.numpy_input_fn生产输入管道：

# Define the training inputs
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": np.array(training_set.data)},
    y=np.array(training_set.target),
    num_epochs=None,
    shuffle=True)

使用DNNClassifier拟合Iris训练数据

现在你已经配置了你的DNNclassifier模型，你可以用它来拟合Iris训练数据，使用train方法。传入train_input_fn作为input_fn，以及要训练的步数(这里是2000)：

# Train model.
classifier.train(input_fn=train_input_fn, steps=2000)

模型的状态保存在classifier这意味着如果你喜欢，你可以反复训练。例如，上面的代买和以下内容相当：

classifier.train(input_fn=train_input_fn, steps=1000)
classifier.train(input_fn=train_input_fn, steps=1000)

但是，如果您想在训练时跟踪模型，则可能需要使用TensorFlowSessionRunHook执行日志记录操作。

评估模型的准确性

完成模型训练之后，就可以检查Iris测试数据的准确性了，使用evaluate方法。跟train一样，evaluate需要一个输入函数来建立它的输入管道。evaluate返回一个dict保存的评估结果。以下代码将Iris测试数据 – test_set.data和test_set.target传入evaluate并打印accuracy结果：

# Define the test inputs
test_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": np.array(test_set.data)},
    y=np.array(test_set.target),
    num_epochs=1,
    shuffle=False)

# Evaluate accuracy.
accuracy_score = classifier.evaluate(input_fn=test_input_fn)["accuracy"]

print("\nTest Accuracy: {0:f}\n".format(accuracy_score))

注意：传给numpy_input_fn的num_epochs=1参数在这里很重要。test_input_fn会遍历数据一次，然后抛出OutOfRangeError。这个错误让分类器停止评估，所以它只会在输入上评估一次。

当你运行完整的脚本时，它会打印出一些接近的内容：

Test Accuracy: 0.966667

您的结果准确性可能会有所不同，但应该高于90％。在一个相对较小的数据集上，这个效果已经很不错了！

分类新样本

使用估算器predict()方法分类新样本。例如，假设你有这两个新的花样本：

萼片长度	萼片宽度	花瓣长度	花瓣宽度
6.4	3.2	4.5	1.5
5.8	3.1	5	1.7

你可以用predict()方法来预测它们的种类。predict返回一个字符串生成器，它可以很容易地转换为列表。以下代码检索并打印类的预测结果：

# Classify two new flower samples.
new_samples = np.array(
    [[6.4, 3.2, 4.5, 1.5],
     [5.8, 3.1, 5.0, 1.7]], dtype=np.float32)
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": new_samples},
    num_epochs=1,
    shuffle=False)

predictions = list(classifier.predict(input_fn=predict_input_fn))
predicted_classes = [p["classes"] for p in predictions]

print(
    "New Samples, Class Predictions:    {}\n"
    .format(predicted_classes))

你的结果应该如下所示：

New Samples, Class Predictions:    [1 2]

因此模型预测第一个样本是Iris versicolor，第二个样本是iris virginica。

其他资源

To learn more about using tf.estimator to create linear models, see
Large-scale Linear Models with TensorFlow.
To build your own Estimator using tf.estimator APIs, check out
Creating Estimators in tf.estimator.
To experiment with neural network modeling and visualization in the browser,
check out Deep Playground.
For more advanced tutorials on neural networks, see
Convolutional Neural Networks and Recurrent Neural
Networks.

参考资料

tf.estimator Quickstart

鲜花

握手

雷人

路过

鸡蛋

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Tensorflow技术101发布时间：2022-05-14

用tf.estimator构建输入函数发布时间：2022-05-14

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：19134|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：9973|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8317|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8686|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8627|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9643|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8611|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：7991|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8642|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7527|2022-11-06

客服电话

电子邮件

tf.estimator快速入门

完整的神经网络源代码

将Iris CSV数据加载到TensorFlow

构建深度神经网络分类器

描述训练输入的pipeline

使用DNNClassifier拟合Iris训练数据

评估模型的准确性

分类新样本

其他资源

参考资料

上一篇：

下一篇：

PacktPublishing/Python-Machine-Learning-

sussillo/hfopt-matlab: A parallel, cpu-b

鲁东大学一米网:Win7系统USB驱动器RAM的操

emersion/go-ostatus: An OStatus library

CVE-2022-22982

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053