Gearman(分布式任务调度框架) 简介

在@hacker101的围脖上看到了这个框架,看了下官网,觉得很适合爬虫、多进程并行处理等任务。

看看官方提供的几个例子:

  • Mass PDF quote email
  • Synchronous Image Resize
  • Shard-Query: a PHP project which uses Net_Gearman to execute queries on horizontally partitioned databases and returns the results. Supports aggregation.
  • Distribute Nagios Checks/Eventhandler with Gearman
  • Feed fetching / parsing

小结一下,如果你的单机程序逻辑可以用清晰的多个过程表示,且可以水平切分(或者是数据,如MySQL分片;要么是计算能力,如多个机器并行执行)。则可以考虑引入Gearman,将单机系统快速变为分布式执行系统。

框架简介,复制自官网

A Gearman powered application consists of three parts: a client, a worker, and a job server. The client is responsible for creating a job to be run and sending it to a job server. The job server will find a suitable worker that can run the job and forwards the job on. The worker performs the work requested by the client and sends a response to the client through the job server. Gearman provides client and worker APIs that your applications call to talk with the Gearman job server (also known as gearmand) so you don't need to deal with networking or mapping of jobs. Internally, the gearman client and worker APIs communicate with the job server using TCP sockets. To explain how Gearman works in more detail, lets look at a simple application that will reverse the order of characters in a string. The example is given in PHP, although other APIs will look quite similar.

一个例子,还是来自官网。

We start off by writing a client application that is responsible for sending off the job and waiting for the result so it can print it out. It does this by using the Gearman client API to send some data associated with a function name, in this case the function “reverse”. The code for this is (with error handling omitted for brevity):

# Reverse Client Code
$client= new GearmanClient();
$client->addServer();
print $client->do("reverse", "Hello World!");

This code initializes a client class, configures it to use a job server with add_server (no arguments means use 127.0.0.1 with the default port), and then tells the client API to run the “reverse” function with the workload “Hello world!”. The function name and arguments are completely arbitrary as far as Gearman is concerned, so you could send any data structure that is appropriate for your application (text or binary). At this point the Gearman client API will package up the job into a Gearman protocol packet and send it to the job server to find an appropriate worker that can run the “reverse” function. Let's now look at the worker code:

# Reverse Worker Code
$worker= new GearmanWorker();
$worker->addServer();
$worker->addFunction("reverse", "my_reverse_function");
while ($worker->work());

function my_reverse_function($job)
{
  return strrev($job->workload());
}

This code defines a function “my_reverse_function” that takes a string and returns the reverse of that string. It is used by a worker object to register a function named “reverse” after it is setup to connect to the same local job server as the client. When the job server receives the job to be run, it looks at the list of workers who have registered the function name “reverse” and forwards the job on to one of the free workers. The Gearman worker API then takes this request, runs the function “my_reverse_function”, and sends the result of that function back through the job server to the client.

 

As you can see, the client and worker APIs (along with the job server) deal with the job management and network communication so you can focus on the application parts. There a few different ways you can run jobs in Gearman, including background for asynchronous processing and prioritized jobs. See the documentation available for the various APIs for details.

 

下次再有爬虫要写的时候,可以尝试一下,恩。

 

Leave a Reply

Your email address will not be published. Required fields are marked *