Python mr_two_step_job.MRTwoStepJob类代码示例

OStack程序员社区-中国程序员成长平台 › 门户 › 编程› Python›Python编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Python中tests.mr_two_step_job.MRTwoStepJob类的典型用法代码示例。如果您正苦于以下问题：Python MRTwoStepJob类的具体用法？Python MRTwoStepJob怎么用？Python MRTwoStepJob使用的例子？那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

在下文中一共展示了MRTwoStepJob类的20个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Python代码示例。

示例1: test_failed_job

    def test_failed_job(self):
        mr_job = MRTwoStepJob(['-r', 'dataproc', '-v'])
        mr_job.sandbox()

        with no_handlers_for_logger('mrjob.dataproc'):
            stderr = StringIO()
            log_to_stream('mrjob.dataproc', stderr)

            self._dataproc_client.job_get_advances_states = (
                collections.deque(['SETUP_DONE', 'RUNNING', 'ERROR']))

            with mr_job.make_runner() as runner:
                self.assertIsInstance(runner, DataprocJobRunner)

                self.assertRaises(StepFailedException, runner.run)

                self.assertIn(' => ERROR\n', stderr.getvalue())

                cluster_id = runner.get_cluster_id()

        # job should get terminated
        cluster = (
            self._dataproc_client._cache_clusters[_TEST_PROJECT][cluster_id])
        cluster_state = self._dataproc_client.get_state(cluster)
        self.assertEqual(cluster_state, 'DELETING')

开发者ID:okomestudio，项目名称:mrjob，代码行数:25，代码来源:test_dataproc.py

示例2: test_attach_to_existing_job_flow

    def test_attach_to_existing_job_flow(self):
        emr_conn = EMRJobRunner(conf_path=False).make_emr_conn()
        # set log_uri to None, so that when we describe the job flow, it
        # won't have the loguri attribute, to test Issue #112
        emr_job_flow_id = emr_conn.run_jobflow(
            name='Development Job Flow', log_uri=None)

        stdin = StringIO('foo\nbar\n')
        self.mock_emr_output = {(emr_job_flow_id, 1): [
            '1\t"bar"\n1\t"foo"\n2\tnull\n']}

        mr_job = MRTwoStepJob(['-r', 'emr', '-v',
                               '-c', self.mrjob_conf_path,
                               '--emr-job-flow-id', emr_job_flow_id])
        mr_job.sandbox(stdin=stdin)

        results = []
        with mr_job.make_runner() as runner:
            runner.run()

            # Issue 182: don't create the bootstrap script when
            # attaching to another job flow
            assert_equal(runner._master_bootstrap_script, None)

            for line in runner.stream_output():
                key, value = mr_job.parse_output_line(line)
                results.append((key, value))

        assert_equal(sorted(results),
            [(1, 'bar'), (1, 'foo'), (2, None)])

开发者ID:hblanks，项目名称:mrjob，代码行数:30，代码来源:emr_test.py

示例3: test_owner_and_label_switches

    def test_owner_and_label_switches(self):
        runner_opts = ["--no-conf", "--owner=ads", "--label=ads_chain"]
        runner = MRTwoStepJob(runner_opts).make_runner()
        match = JOB_NAME_RE.match(runner.get_job_name())

        self.assertEqual(match.group(1), "ads_chain")
        self.assertEqual(match.group(2), "ads")

开发者ID:pyzen，项目名称:mrjob，代码行数:7，代码来源:test_runner.py

示例4: test_bootstrap_python_comes_before_bootstrap

    def test_bootstrap_python_comes_before_bootstrap(self):
        mr_job = MRTwoStepJob(['-r', 'dataproc', '--bootstrap', 'true'])

        with mr_job.make_runner() as runner:
            self.assertEqual(
                runner._bootstrap,
                self.EXPECTED_BOOTSTRAP + [['true']])

开发者ID:Jeremyfanfan，项目名称:mrjob，代码行数:7，代码来源:test_dataproc.py

示例5: test_python_dash_v_as_python_bin

    def test_python_dash_v_as_python_bin(self):
        python_cmd = cmd_line([sys.executable or 'python', '-v'])
        mr_job = MRTwoStepJob(['--python-bin', python_cmd, '--no-conf',
                               '-r', 'local'])
        mr_job.sandbox(stdin=[b'bar\n'])

        with mr_job.make_runner() as runner:
            runner.run()

            # expect python -v crud in stderr

            with open(runner._task_stderr_path('mapper', 0, 0)) as lines:
                self.assertTrue(any(
                    'import mrjob' in line or  # Python 2
                    "import 'mrjob'" in line
                    for line in lines))

            with open(runner._task_stderr_path('mapper', 0, 0)) as lines:
                self.assertTrue(any(
                    '#' in line for line in lines))

            # should still get expected results
            self.assertEqual(
                sorted(to_lines(runner.cat_output())),
                sorted([b'1\tnull\n', b'1\t"bar"\n']))

开发者ID:Affirm，项目名称:mrjob，代码行数:25，代码来源:test_local.py

示例6: test_dont_take_down_cluster_on_failure

    def test_dont_take_down_cluster_on_failure(self):
        runner = DataprocJobRunner(conf_paths=[])

        cluster_body = runner.api_client.cluster_create()
        cluster_id = cluster_body['clusterName']

        mr_job = MRTwoStepJob(['-r', 'dataproc', '-v',
                               '--cluster-id', cluster_id])
        mr_job.sandbox()

        self._dataproc_client.job_get_advances_states = collections.deque(['SETUP_DONE', 'RUNNING', 'ERROR'])

        with mr_job.make_runner() as runner:
            self.assertIsInstance(runner, DataprocJobRunner)

            with logger_disabled('mrjob.dataproc'):
                self.assertRaises(StepFailedException, runner.run)

            cluster = self.get_cluster_from_runner(runner, cluster_id)
            cluster_state = self._dataproc_client.get_state(cluster)
            self.assertEqual(cluster_state, 'RUNNING')

        # job shouldn't get terminated by cleanup
        cluster = self._dataproc_client._cache_clusters[_TEST_PROJECT][cluster_id]
        cluster_state = self._dataproc_client.get_state(cluster)
        self.assertEqual(cluster_state, 'RUNNING')

开发者ID:Jeremyfanfan，项目名称:mrjob，代码行数:26，代码来源:test_dataproc.py

示例7: test_failed_job

    def test_failed_job(self):
        mr_job = MRTwoStepJob(['-r', 'emr', '-v',
                               '-c', self.mrjob_conf_path])
        mr_job.sandbox()

        self.add_mock_s3_data({'walrus': {}})
        self.mock_emr_failures = {('j-MOCKJOBFLOW0', 0): None}

        with mr_job.make_runner() as runner:
            assert isinstance(runner, EMRJobRunner)

            with logger_disabled('mrjob.emr'):
                assert_raises(Exception, runner.run)

            emr_conn = botoemr.EmrConnection()
            job_flow_id = runner.get_emr_job_flow_id()
            for i in range(10):
                emr_conn.simulate_progress(job_flow_id)

            job_flow = emr_conn.describe_jobflow(job_flow_id)
            assert_equal(job_flow.state, 'FAILED')

        # job should get terminated on cleanup
        emr_conn = runner.make_emr_conn()
        job_flow_id = runner.get_emr_job_flow_id()
        for i in range(10):
            emr_conn.simulate_progress(job_flow_id)

        job_flow = emr_conn.describe_jobflow(job_flow_id)
        assert_equal(job_flow.state, 'TERMINATED')

开发者ID:boursier，项目名称:mrjob，代码行数:30，代码来源:emr_test.py

示例8: test_attach_to_existing_cluster

    def test_attach_to_existing_cluster(self):
        runner = DataprocJobRunner(conf_paths=[])

        cluster_body = runner.api_client.cluster_create()
        cluster_id = cluster_body['clusterName']

        stdin = BytesIO(b'foo\nbar\n')

        mr_job = MRTwoStepJob(['-r', 'dataproc', '-v',
                               '--cluster-id', cluster_id])
        mr_job.sandbox(stdin=stdin)

        results = []

        with mr_job.make_runner() as runner:
            runner.run()

            # Generate fake output
            self.put_job_output_parts(runner, [
                b'1\t"bar"\n1\t"foo"\n2\tnull\n'
            ])

            # Issue 182: don't create the bootstrap script when
            # attaching to another cluster
            self.assertIsNone(runner._master_bootstrap_script_path)

            for line in runner.stream_output():
                key, value = mr_job.parse_output_line(line)
                results.append((key, value))

        self.assertEqual(sorted(results),
                         [(1, 'bar'), (1, 'foo'), (2, None)])

开发者ID:Jeremyfanfan，项目名称:mrjob，代码行数:32，代码来源:test_dataproc.py

示例9: test_owner_and_label_switches

    def test_owner_and_label_switches(self):
        runner_opts = ['--no-conf', '--owner=ads', '--label=ads_chain']
        runner = MRTwoStepJob(runner_opts).make_runner()
        match = _JOB_KEY_RE.match(runner.get_job_key())

        self.assertEqual(match.group(1), 'ads_chain')
        self.assertEqual(match.group(2), 'ads')

开发者ID:anirudhreddy92，项目名称:mrjob，代码行数:7，代码来源:test_runner.py

示例10: test_end_to_end

    def test_end_to_end(self):
        # read from STDIN, a regular file, and a .gz
        stdin = StringIO("foo\nbar\n")

        input_path = os.path.join(self.tmp_dir, "input")
        with open(input_path, "w") as input_file:
            input_file.write("bar\nqux\n")

        input_gz_path = os.path.join(self.tmp_dir, "input.gz")
        input_gz = gzip.GzipFile(input_gz_path, "w")
        input_gz.write("foo\n")
        input_gz.close()

        mr_job = MRTwoStepJob(["-c", self.mrjob_conf_path, "-", input_path, input_gz_path])
        mr_job.sandbox(stdin=stdin)

        local_tmp_dir = None
        results = []

        with mr_job.make_runner() as runner:
            assert isinstance(runner, LocalMRJobRunner)
            runner.run()

            for line in runner.stream_output():
                key, value = mr_job.parse_output_line(line)
                results.append((key, value))

            local_tmp_dir = runner._get_local_tmp_dir()
            assert os.path.exists(local_tmp_dir)

        # make sure cleanup happens
        assert not os.path.exists(local_tmp_dir)

        assert_equal(sorted(results), [(1, "qux"), (2, "bar"), (2, "foo"), (5, None)])

开发者ID:ksho，项目名称:mrjob，代码行数:34，代码来源:local_test.py

示例11: test_end_to_end

    def test_end_to_end(self):
        # read from STDIN, a regular file, and a .gz
        stdin = BytesIO(b'foo\nbar\n')

        input_path = join(self.tmp_dir, 'input')
        with open(input_path, 'w') as input_file:
            input_file.write('bar\nqux\n')

        input_gz_path = join(self.tmp_dir, 'input.gz')
        input_gz = gzip.GzipFile(input_gz_path, 'wb')
        input_gz.write(b'foo\n')
        input_gz.close()

        mr_job = MRTwoStepJob(
            ['--runner', 'inline', '-', input_path, input_gz_path])
        mr_job.sandbox(stdin=stdin)

        local_tmp_dir = None
        results = []

        with mr_job.make_runner() as runner:
            assert isinstance(runner, InlineMRJobRunner)
            runner.run()

            results.extend(mr_job.parse_output(runner.cat_output()))

            local_tmp_dir = runner._get_local_tmp_dir()
            assert exists(local_tmp_dir)

        # make sure cleanup happens
        assert not exists(local_tmp_dir)

        self.assertEqual(sorted(results),
                         [(1, 'qux'), (2, 'bar'), (2, 'foo'), (5, None)])

开发者ID:Yelp，项目名称:mrjob，代码行数:34，代码来源:test_inline.py

示例12: test_two_step_job

    def test_two_step_job(self):
        # good all-around test. MRTwoStepJob's first step logs counters, but
        # its second step does not
        job = MRTwoStepJob(['-r', 'spark'])
        job.sandbox(stdin=BytesIO(b'foo\nbar\n'))

        with job.make_runner() as runner:
            runner.run()

        counters = runner.counters()

        # should have two steps worth of counters, even though it runs as a
        # single Spark job
        self.assertEqual(len(counters), 2)

        # first step counters should be {'count': {'combiners': <int>}}
        self.assertEqual(sorted(counters[0]), ['count'])
        self.assertEqual(sorted(counters[0]['count']), ['combiners'])
        self.assertIsInstance(counters[0]['count']['combiners'], int)

        # second step counters should be empty
        self.assertEqual(counters[1], {})

        log_output = '\n'.join(c[0][0] for c in self.log.info.call_args_list)
        log_lines = log_output.split('\n')

        # should log first step counters but not second step
        self.assertIn('Counters for step 1: 1', log_lines)
        self.assertIn('\tcount', log_output)
        self.assertNotIn('Counters for step 2', log_output)

开发者ID:Yelp，项目名称:mrjob，代码行数:30，代码来源:test_runner.py

示例13: test_bootstrap_python_switch

    def test_bootstrap_python_switch(self):
        mr_job = MRTwoStepJob(["-r", "dataproc", "--bootstrap-python"])

        with mr_job.make_runner() as runner:
            self.assertEqual(runner._opts["bootstrap_python"], True)
            self.assertEqual(runner._bootstrap_python(), self.EXPECTED_BOOTSTRAP)
            self.assertEqual(runner._bootstrap, self.EXPECTED_BOOTSTRAP)

开发者ID:davidmarin，项目名称:mrjob，代码行数:7，代码来源:test_dataproc.py

示例14: test_owner_and_label_switches

    def test_owner_and_label_switches(self):
        runner_opts = ['--no-conf', '--owner=ads', '--label=ads_chain']
        runner = MRTwoStepJob(runner_opts).make_runner()
        match = JOB_NAME_RE.match(runner.get_job_name())

        assert_equal(match.group(1), 'ads_chain')
        assert_equal(match.group(2), 'ads')

开发者ID:chomp，项目名称:mrjob，代码行数:7，代码来源:runner_test.py

示例15: test_missing_input

    def test_missing_input(self):
        mr_job = MRTwoStepJob(['-r', 'inline', '/some/bogus/file/path'])
        mr_job.sandbox()

        with mr_job.make_runner() as runner:
            assert isinstance(runner, InlineMRJobRunner)
            self.assertRaises(IOError, runner.run)

开发者ID:Yelp，项目名称:mrjob，代码行数:7，代码来源:test_inline.py

示例16: test_default

 def test_default(self):
     mr_job = MRTwoStepJob(['-r', 'dataproc'])
     with mr_job.make_runner() as runner:
         self.assertEqual(runner._opts['bootstrap_python'], True)
         self.assertEqual(runner._bootstrap_python(),
                          self.EXPECTED_BOOTSTRAP)
         self.assertEqual(runner._bootstrap,
                          self.EXPECTED_BOOTSTRAP)

开发者ID:Jeremyfanfan，项目名称:mrjob，代码行数:8，代码来源:test_dataproc.py

示例17: test_streaming_step_not_okay

    def test_streaming_step_not_okay(self):
        job = MRTwoStepJob()
        job.sandbox()

        with job.make_runner() as runner:
            self.assertRaises(
                TypeError,
                runner._spark_script_args, 0)

开发者ID:Yelp，项目名称:mrjob，代码行数:8，代码来源:test_runner.py

示例18: test_end_to_end

    def test_end_to_end(self):
        # read from STDIN, a local file, and a remote file
        stdin = StringIO('foo\nbar\n')

        local_input_path = os.path.join(self.tmp_dir, 'input')
        with open(local_input_path, 'w') as local_input_file:
            local_input_file.write('bar\nqux\n')

        input_to_upload = os.path.join(self.tmp_dir, 'remote_input')
        with open(input_to_upload, 'w') as input_to_upload_file:
            input_to_upload_file.write('foo\n')
        remote_input_path = 'hdfs:///data/foo'
        check_call([self.hadoop_bin,
                    'fs', '-put', input_to_upload, remote_input_path])

        # doesn't matter what the intermediate output is; just has to exist.
        add_mock_hadoop_output([''])
        add_mock_hadoop_output(['1\t"qux"\n2\t"bar"\n', '2\t"foo"\n5\tnull\n'])

        mr_job = MRTwoStepJob(['-r', 'hadoop', '-v',
                               '--no-conf', '--hadoop-arg', '-libjar',
                               '--hadoop-arg', 'containsJars.jar',
                               '-', local_input_path, remote_input_path])
        mr_job.sandbox(stdin=stdin)

        local_tmp_dir = None
        results = []

        with mr_job.make_runner() as runner:
            assert isinstance(runner, HadoopJobRunner)
            runner.run()

            for line in runner.stream_output():
                key, value = mr_job.parse_output_line(line)
                results.append((key, value))

            local_tmp_dir = runner._get_local_tmp_dir()
            # make sure cleanup hasn't happened yet
            assert os.path.exists(local_tmp_dir)
            assert any(runner.ls(runner.get_output_dir()))

            # make sure we're writing to the correct path in HDFS
            hdfs_root = os.environ['MOCK_HDFS_ROOT']
            assert_equal(sorted(os.listdir(hdfs_root)), ['data', 'user'])
            home_dir = os.path.join(hdfs_root, 'user', getpass.getuser())
            assert_equal(os.listdir(home_dir), ['tmp'])
            assert_equal(os.listdir(os.path.join(home_dir, 'tmp')), ['mrjob'])
            assert_equal(runner._opts['hadoop_extra_args'],
                         ['-libjar', 'containsJars.jar'])

        assert_equal(sorted(results),
                     [(1, 'qux'), (2, 'bar'), (2, 'foo'), (5, None)])

        # make sure cleanup happens
        assert not os.path.exists(local_tmp_dir)
        assert not any(runner.ls(runner.get_output_dir()))

开发者ID:nedrocks，项目名称:mrjob，代码行数:56，代码来源:hadoop_test.py

示例19: test_debugging_works

    def test_debugging_works(self):
        mr_job = MRTwoStepJob(['-r', 'emr', '-v',
                           '-c', self.mrjob_conf_path,
                           '--enable-emr-debugging'])
        mr_job.sandbox()

        with mr_job.make_runner() as runner:
            runner.run()
            flow = runner.make_emr_conn().describe_jobflow(runner._emr_job_flow_id)
            assert_equal(flow.steps[0].name, 'Setup Hadoop Debugging')

开发者ID:hblanks，项目名称:mrjob，代码行数:10，代码来源:emr_test.py

示例20: test_echo_as_steps_python_bin

    def test_echo_as_steps_python_bin(self):
        mr_job = MRTwoStepJob(["--steps", "--steps-python-bin", "echo", "--no-conf", "-r", "local"])
        mr_job.sandbox()

        with mr_job.make_runner() as runner:
            assert isinstance(runner, LocalMRJobRunner)
            # MRTwoStepJob populates _steps in the runner, so un-populate
            # it here so that the runner actually tries to get the steps
            # via subprocess
            runner._steps = None
            self.assertRaises(ValueError, runner._get_steps)

开发者ID:alanhdu，项目名称:mrjob，代码行数:11，代码来源:test_local.py

注：本文中的tests.mr_two_step_job.MRTwoStepJob类示例由纯净天空整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Python mr_word_count.MRWordCount类代码示例发布时间：2022-05-27

Python mr_os_walk_job.MROSWalkJob类代码示例发布时间：2022-05-27

Python util.grid_equal函数代码示例

1 Python 入门教程

Python入门教程 Python 是一种解释型、面向对象、动态数据类型的高级程序设计语言。 P

阅读：13932|2022-01-22

2 Python wikiutil.getFrontPage函数代码示例

Python wikiutil.getFrontPage函数代码示例

阅读：10292|2022-05-24

3 Python 简介

Python 简介 Python 是一个高层次的结合了解释性、编译性、互动性和面向对象的脚本

阅读：4169|2022-01-22

4 Python tests.group函数代码示例

Python tests.group函数代码示例

阅读：4064|2022-05-27

5 Python util.check_if_user_has_permission

Python util.check_if_user_has_permission函数代码示例

阅读：3889|2022-05-27

6 Python 操练实例98

Python 练习实例98 Python 100例题目：从键盘输入一个字符串，将小写字母全部转换成大

阅读：3539|2022-01-22

7 Python 环境搭建

Python 环境搭建本章节我们将向大家介绍如何在本地搭建 Python 开发环境。 Py

阅读：3067|2022-01-22

8 Python 基础语法

Python 基础语法 Python 语言与 Perl，C 和 Java 等语言有许多相似之处。但是，也

阅读：2726|2022-01-22

9 Python output.darkgreen函数代码示例

Python output.darkgreen函数代码示例

阅读：2682|2022-05-25

10 Python 中文编码

Python 中文编码前面章节中我们已经学会了如何用 Python 输出 Hello, World!，英文没

阅读：2346|2022-01-22

客服电话

电子邮件

Python mr_two_step_job.MRTwoStepJob类代码示例

示例1: test_failed_job

示例2: test_attach_to_existing_job_flow

示例3: test_owner_and_label_switches

示例4: test_bootstrap_python_comes_before_bootstrap

示例5: test_python_dash_v_as_python_bin

示例6: test_dont_take_down_cluster_on_failure

示例7: test_failed_job

示例8: test_attach_to_existing_cluster

示例9: test_owner_and_label_switches

示例10: test_end_to_end

示例11: test_end_to_end

示例12: test_two_step_job

示例13: test_bootstrap_python_switch

示例14: test_owner_and_label_switches

示例15: test_missing_input

示例16: test_default

示例17: test_streaming_step_not_okay

示例18: test_end_to_end

示例19: test_debugging_works

示例20: test_echo_as_steps_python_bin

请发表评论

全部评论

上一篇：

下一篇：

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.get_stdout函数代码示例

关于我们

产品与服务

解决方案

139-2527-9053