Home > Articles > Lightweight and multiplatform PHP Multithreading Engine

Lightweight and multiplatform PHP Multithreading Engine

Short Introduction

This article could have appeared approximately 1,5 year ago, but at that time I didn’t have any free time nor ability to publish anything connected with it, because source code parts described here were used in several commercial projects.

The main problem of PHP multithreading engine implementation is a shared piece of memory that usually vary from one OS to another. This shared memory used by threads in order to talk to each other, get tasks and commands, give back results and use other shared Input/Output resources if needed. As PHP could be used on at least two platforms (Linux, Windows or even more [1]) which have different memory management, structure, etc. from the Programmers point of view. This fact lowers the chances to make a multiplatform PHP multithreading engine.

Related Articles and Blog posts on PHP Multithreading

Several quite strong attempts in the direction of PHP Multithreading are: PHP Threader – Multithreading-like Functionality in PHP [12], Emulate threads using separate HTTP requests [13] and Improved Thread Simulation Class for PHP [14].
At the first glance the first one [12] seems to be complete and fine, but in closer look (analysis of code and examples) there are some serious drawbacks exist, namely:

  1. There is no distribution of work and a single job would not divide between threads. Work is done by executing in parallel PHP files (threads) that make separate jobs. However multithreading considers single or several jobs equally divided between threads and are being calculated by several threads concurrently.
  2. Shared memory is actually a shared file or MySQL database just for sending messages between main thread and other threads, not for distribution of job/tasks and tracing the state etc.
  3. There is no proper management of threads Lifecycle.
  4. It seems that huge workload could make problems for calculation and shared data using such approach.
  5. The approach does not provide low-level multithreading. Due to lack of automatic management of job distribution, it is both hard and increases probability of bugs to implement applications that use more than 2-3 threads at once using this package.

What about the second one [13]? The Author claims “This package provides an alternative solution that consists in sending multiple HTTP requests to the same Web server on which PHP is running”. The third one [14] is also based on HTTP requests. The approach is reasonable, but both solutions lack of thread management, job distribution etc. They do the first part (start several threads and get response from them) very well, but shared piece of memory, full thread control, distribution of jobs/tasks, etc. is missing. It is also questionable would they work with huge amounts of data.

Other previous articles on the topic of PHP Multithreading (see [2],[3],[4]) mostly provide information about using, for example, forking (see [8] for pcntl_fork) in Linux or curling (see [9] for curl) in windows etc. They still do not provide full Multithreading solution.
Additionally there were a lot of attempts to realize PHP Multithreading [10], [11].
Reading the articles Straight away several questions arise. How would we track life-cycle of threads? What happens if some of the threads would hang on or crash unexpectedly without any notice?

New Approach!

Inspired by:

  • These two articles [5],[6] (in my opinion it is the best approach in this case)
  • TTU (www.ttu.ee) university course Informatics II by PhD Innar Liiv, where small Grid-like computation application on C++ was a topic of my project.

and

  • a strong need for a really working PHP Multithreading engine mostly for Information Extraction from Web purposes.

As a result some ideas came how to improve the code and concept provided in articles above [5] and [6], make the algorithm more automated, universal, clean and less complicated.

What kind of improvements will be added?:

Improvement I: Shared memory

As a shared piece of memory any database (MySql, Postgres, Oracle, etc.) could be used
We need only two tables (please see a picture below), one for messaging/tasking/command called cmd and another one for tracking the life-cycle of threads called threads.

cms and thread tables

cms and thread tables

Table: Cmd
cmd_id – just a primary key
proc_id – ID of a thread
cmd – command given (for example: calculate, exit, etc.)
param – additional parameters needed for the command, usually a serialized PHP object
result – stored after the command was done, usually a serialized PHP object.
done – flag for Main thread, was command/task done or not, helps to calculate results and reassign task to another thread if current is not responding.
datestamp – just and time and date

Table: Threads
threads_id – just a primary key
proc_id – ID of a thread
last_beat – last timestamp when Thread was alive
busy – flag, is it busy or not
state – parameter that represents state (for example exit, ready, etc.)

Personally I used MSSQL Server, but tables and commands are ANSI SQL compatible, that means there is no problem using other databases like MySQL, Postgres, Oracle etc. (further in the article you will see that for the communication with database the EZsql DB abstraction class [7] is used, so it is easy to change the DB engine. EZsql is not the best solution/abstraction class and of course you can use your own connector to access the Database).

MSSQL DDL of cmd and thread tables:

CREATE TABLE [threads] (
[thread_id] [bigint] IDENTITY (1, 1) NOT NULL ,
[proc_id] [int] NULL ,
[last_beat] [varchar] (50) NULL ,
[busy] [int] NULL ,
[state] [int] NULL
) ON [PRIMARY]

CREATE TABLE [cmd] (
[cmd_id] [bigint] IDENTITY (1, 1) NOT NULL ,
[proc_id] [int] NULL ,
[cmd] [varchar] (255) NULL ,
[param] [text] NULL ,
[result] [text] NULL ,
[done] [int] NULL ,
[datestamp] [datetime] NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

Example of data
Data shown is for application with 30 threads.
Cmd table, Threads table.

Improvement II: Message broker

Message/task/commands brokering is done through Database together with the functions in the code:
Loop “THREADS main cycle”

do{

}while($q);

and

function tell($thought, $params = NULL) {}

and finally

### Get results block

please see main.php and Thread.php for detailed information

Improvement III: thread life-cycle management

We need to know which of the threads are ready for processing, which are busy, which are ended the processing and asking for termination, etc. Message broker is used to give “vital” commands to threads.
This is a job of the following functions:

function isActive () {}
function isBusy() {}

please see Thread.php

The source code is well annotated and commented, so have a look inside.

Download: PHP_Multithreading_sourcecode_v1.0

Performance and Evaluation: 5 hours vs 3 minutes

One of the applications where the code was used was a massive download and generating thumbnails from pictures.
Work volume: ~3000 JPEG pictures, 0.5-1.5 Mb each
Hardware used: 1.4 Ghz Pentium 4 processor, 2 GB RAM, IIS 5.5 etc.
Internet connection: 8/8 Mbit synchronous connection
Linear download and resizing would take 5 hours
Multithreaded solution with 20 threads took less than 3 minutes.

Conclusion

What definitely distinguishes PHP Multithreading engine proposed in current article?

  1. Multiplatform or platform independent
  2. Highly customizable and lightweight
  3. There is no redundant code inside; only best in this case technologies are used.
  4. Works both in the Web and as CLI
  5. Code is OOP, well commented, easy to understand, do not depend on any other frameworks or drivers.
  6. As a thread shared memory any Database Engine could be used (MySQL, Postgres, Oracle, MSSQL etc.). It provides the ability to use full power of RDBMS.

Any comments, questions and suggestions about the article are highly appreciated.
The next step will be creating PHP Map/Reduce similar implementation and hosting it on http://sourceforge.net/projects/phpmapreduce/.

Download Source

 

download-button

References

[1] Supported platforms by PHP http://wiki.php.net/platforms
[2] Sonic server http://dev.pedemont.com/sonic/
[3] Process Forking with PHP http://www.electrictoolbox.com/article/php/process-forking/
[4] Multithreading in PHP with CURL http://www.ibuildings.nl/blog/archives/811-Multithreading-in-PHP-with-CURL.html
[5] Multi-threading strategies in PHP http://www.alternateinterior.com/2007/05/multi-threading-strategies-in-php.html
[6] Communicating with threads in PHP http://www.alternateinterior.com/2007/05/communicating-with-threads-in-php.html
[7] EZsql DB abstraction class http://www.woyano.com/jv/ezsql
[8] PHP Function pcntl_fork http://www.php.net/manual/en/function.pcntl-fork.php
[9] PHP library Curl http://www.php.net/curl
[10] Attempt to make PHP Multithreading Tutorial
http://phpmultithreaddaemon.blogspot.com/2007/09/introduction.html
[11] Discussions about PHP Multithreading http://webforumz.com/php/12595-multithreaded-php.htm
[12] PHP Threader
MultiThreading-like Functionality in PHP http://www.phpclasses.org/browse/package/4082.html
[13] Emulate threads using separate HTTP requests http://www.phpclasses.org/browse/package/3953.html
[14] Improved Thread Simulation Class for PHP http://w-shadow.com/blog/2008/05/24/improved-thread-simulation-class-for-php/

Share and Enjoy:
  • Twitter
  • Google Bookmarks
  • del.icio.us
  • Facebook
  • Print this article!
  • Digg
  • Sphinn
  • Mixx
  • E-mail this story to a friend!
  • LinkedIn
Author: Anton Vedeshin Categories: Articles Tags: ,
  1. John
    July 3rd, 2009 at 16:27 | #1

    Sounds interesting, ill try it

    Reply

  2. July 3rd, 2009 at 17:56 | #2

    Is there no code for this yet?

    Reply

    Anton Vedeshin Reply:

    It is : ) Ctrl+F -> Download, it is inside the text
    anyway http://anton.vedeshin.com/wp-content/uploads/2009/07/PHP_Multithreading_sourcecode.zip

    Reply

  3. john
    November 10th, 2009 at 16:44 | #3

    Please show us an example on how to use the code.
    Thank you.

    Reply

    Anton Vedeshin Reply:

    Hi!
    The example is also in attachment, please provide more details, which kind of example you would like to have.

    Reply

    aaa Reply:

    ascascascasc

    Reply

  4. January 23rd, 2010 at 11:58 | #4

    hi Anton Vedeshin ,
    Your article about threading is very interesting . Could you show me an example on how to use your code to grab contents from other websites, and also if you could send the code of ” Performance and Evaluation: 5 hours vs 3 minutes” for further reference? I appreciate your work very much .

    Reply

    Anton Vedeshin Reply:

    The code in example is complete in the sence that it is possible to execute it, etc. If you put some file_get_contents(…) into calculate.php it will start downloading contents from other web sites.

    Reply

  5. dasher
    January 29th, 2010 at 02:41 | #5

    Interesting but it’s not multi-threading – it’s spawning another process.
    http://es2.php.net/manual/en/function.popen.php – essentially forking not multi threading.

    Reply

    Anton Vedeshin Reply:

    Yes, of course it is not PHP core multithreading, but it is emulated multithreading which is platform independent and does it’s job.

    Reply

  6. February 28th, 2010 at 04:46 | #6

    Thx for your project. It looks really useful ,but I have the same problem as someone before me. How could I start my functions? So I have a function that i want to start in 30 threads. At least this functons will be allover called 300 times. So I start with 30 threads and if one process is finished how will be the other processes get startet? Give us please an example. I read something about objects and jobs. Could you also explain this a little bit more. thx in advance.

    bronko

    Reply

    Anton Vedeshin Reply:

    Well, actually the code is complete to execute the example. Do you mean “serialized PHP object”? It is like in java, you can “store objects” as text if you want to use them in some enother execution of the programm. If you serialize some object into JSON in Ajax, it is almost the same thing.

    Reply

  7. April 23rd, 2010 at 15:46 | #7

    Sounds interesting, ill try it

    Reply

  8. May 21st, 2010 at 07:07 | #8

    Interesting but it’s not multi-threading – it’s spawning another process.
    http://es2.php.net/manual/en/function.popen.php – essentially forking not multi threading.

    Reply

    Anton Vedeshin Reply:

    Yes, of course it is not PHP core multithreading, but it is emulated multithreading which is platform independent.

    Reply

  9. May 22nd, 2010 at 07:49 | #9

    [...] This post was Twitted by antonvedeshin [...]

    Reply

  10. June 5th, 2010 at 07:40 | #10

    [...] This post was Twitted by antonvedeshin [...]

    Reply

  11. June 29th, 2010 at 10:05 | #11

    Ehm… a download of ~3000 JPEGs on a 8/8 Mbit connection takes more than 3 minutes. So the comparison in the end is hard to believe.

    Say we have 3000 JPEGs of 1 MB each. That’s 3000 MB. To download this on a 8 Mbit line takes 3000M*(8 bits)/8Mbits = 3000 seconds. That’s 50 minutes for downloading only, and under ideal circumstances.

    This does not mean multithreading isn’t a speed booster, but not as much as I would be led to believe by the claim made in the last paragraph.

    Reply

    Anton Vedeshin Reply:

    Yes you are right, something is wrong, I need to check picture number and the size. What we had exactly is that before deployment of multi-threading solution it was approximately 1 hour and after only 2:50 minutes.
    I could assume that data density was ~30-40%, actually there were 4 vendors for the hardware e-shop, each had ~700-1200 products after filtering to be listed in the shop. Although every product had a link to the picture, not all the products had a picture behind this link (~60-70% had not). The request was made anyway, and actually multi-threading solution has also reduced time of checking the link. This is my assumption.

    Reply

  12. jason isherwood
    July 21st, 2010 at 17:38 | #12

    Hi Anton
    I too have the same question as others i.e. need a finished example.
    Say if I have an array of 500 url’s that I want to retrieve how do I use your function for that ? If we could I could see your jpeg example, i’m sure it would become much clearer.
    Many thanks in advance.
    Jason

    Reply

    Anton Vedeshin Reply:

    Hi Jason!
    Thank you for your feedback, I will prepare an example soon.

    Reply

  13. November 25th, 2010 at 04:28 | #13

    Use Gearman (http://www.gearman.org) and the PECL project Net_Gearman to run jobs and get results back, either synchronously or asynchronously.

    With this type of system, the actual PHP process it not multi-threaded, but the job queue (gearman) takes care of that for you.

    For a complete example of this method:
    http://code.google.com/p/shard-query

    Reply

  14. April 28th, 2011 at 08:13 | #14

    I’ve bookmarked http://anton.vedeshin.com/articles/lightweight-and-multiplatform-php-multithreading-engine at Reddit.com so my friends can see it too. I used Lightweight and multiplatform PHP Multithreading Engine | Anton Vedeshin so it was a good title.

    Reply

  15. June 9th, 2011 at 05:45 | #15

    @Justin Swanhart
    ey! that’s a lot better than the code here! thanks for sharing!

    Reply

  16. June 10th, 2011 at 15:53 | #16

    I saw the title of this blog post – Lightweight and multiplatform PHP Multithreading Engine | Anton Vedeshin – while I was browsing on the internet a minute ago. Can if I put a link back to http://anton.vedeshin.com/articles/lightweight-and-multiplatform-php-multithreading-engine on my website?

    Reply

  17. Veyron
    July 25th, 2011 at 16:19 | #17

    Hello. I’m trying to get this to work but I’m stuck on line 56 (main.php) “//### GET PARAMETERS AND INPUT DATA FOR WORK INSIDE THREADS
    $result=$db->get_results(”SELECT field1, field2, field3, … FROM input_table”);”. Is it necessary to create a new table called input_table? If yes, which fields are needed?

    Sry for my english :S

    greetings

    Reply

  18. September 22nd, 2011 at 14:32 | #18

    excellent blog. thank you this excellent article. i regards a lot.

    Reply

  19. Ornythorink
    November 19th, 2011 at 14:28 | #19

    Too bad the file is corrupt when i try to unzip with 7-Zip (CRC trouble) :/

    Reply

    Anton Vedeshin Reply:

    It should be fine, corrupted files are not needed.

    Reply

  20. Anton
    November 19th, 2011 at 17:06 | #20

    It should be fine, corrupted files are not needed.

    Reply

  21. December 23rd, 2011 at 04:30 | #21

    Hello Webmaster, I noticed that http://anton.vedeshin.com/articles/lightweight-and-multiplatform-php-multithreading-engine is ranking pretty low on Google and has a low Google PageRank. Now the Google PageRank is how Google is able to see how relevant your webpage is compared to all the other webpages online, if you cannot rank high at the top of Google, then you will NOT get the traffic you need. Now usually trying to get to the top of Google costs hundreds if not thousands of dollars and very highly optimized targeted marketing campaigns that takes a team of experts months to achieve. However, we can show you how to get to the top of Google with no out of pocket expenses (free traffic), no stupid ninja tricks, no silly mind control techniques, and this will be all white hat with no blackhat software or tactics that could possibly land you on bad terms with Google and put you in the dreaded “Google Sandbox”. We’ll show you how to easily capture all the targeted traffic you need, for free, multiple ways to land fast (not months) first-page rankings in Google and other major search engines (Bing, Yahoo, Ask, etc), even show you strategies on how to earn daily commissions just try Ranking Top of Google, please check out our 5 minute video.

    Reply

  22. December 25th, 2011 at 16:16 | #22

    Hello Webmaster, I noticed that http://anton.vedeshin.com/articles/lightweight-and-multiplatform-php-multithreading-engine is ranking pretty low on Google and has a low Google PageRank. Now the Google PageRank is how Google is able to see how relevant your webpage is compared to all the other webpages online, if you cannot rank high at the top of Google, then you will NOT get the traffic you need. Now usually trying to get to the top of Google costs hundreds if not thousands of dollars and very highly optimized targeted marketing campaigns that takes a team of experts months to achieve. However, we can show you how to get to the top of Google with no out of pocket expenses (free traffic), no stupid ninja tricks, no silly mind control techniques, and this will be all white hat with no blackhat software or tactics that could possibly land you on bad terms with Google and put you in the dreaded “Google Sandbox”. We’ll show you how to easily capture all the targeted traffic you need, for free, multiple ways to land fast (not months) first-page rankings in Google and other major search engines (Bing, Yahoo, Ask, etc), even show you strategies on how to earn daily commissions just try Ranking Top of Google, please check out our 5 minute video.

    Reply

  23. January 9th, 2012 at 09:56 | #23

    Hello Webmaster, I noticed that http://anton.vedeshin.com/articles/lightweight-and-multiplatform-php-multithreading-engine is ranking pretty low on Google and has a low Google PageRank. Now the Google PageRank is how Google is able to see how relevant your webpage is compared to all the other webpages online, if you cannot rank high at the top of Google, then you will NOT get the traffic you need. Now usually trying to get to the top of Google costs hundreds if not thousands of dollars and very highly optimized targeted marketing campaigns that takes a team of experts months to achieve. However, we can show you how to get to the top of Google with no out of pocket expenses (free traffic), no stupid ninja tricks, no silly mind control techniques, and this will be all white hat with no blackhat software or tactics that could possibly land you on bad terms with Google and put you in the dreaded “Google Sandbox”. We’ll show you how to easily capture all the targeted traffic you need, for free, multiple ways to land fast (not months) first-page rankings in Google and other major search engines (Bing, Yahoo, Ask, etc), even show you strategies on how to earn daily commissions just try Ranking Top of Google, please check out our 5 minute video.

    Reply

  1. July 4th, 2009 at 01:13 | #1
  2. July 18th, 2009 at 21:55 | #2

Bad Behavior has blocked 282 access attempts in the last 7 days.