<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Anton Vedeshin &#187; PHP</title>
	<atom:link href="http://anton.vedeshin.com/tag/php/feed" rel="self" type="application/rss+xml" />
	<link>http://anton.vedeshin.com</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Fri, 03 Sep 2010 09:46:28 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Lightweight and multiplatform PHP Multithreading Engine</title>
		<link>http://anton.vedeshin.com/articles/lightweight-and-multiplatform-php-multithreading-engine</link>
		<comments>http://anton.vedeshin.com/articles/lightweight-and-multiplatform-php-multithreading-engine#comments</comments>
		<pubDate>Fri, 03 Jul 2009 11:19:50 +0000</pubDate>
		<dc:creator>Anton Vedeshin</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Distributed Computing]]></category>
		<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://anton.vedeshin.com/?p=72</guid>
		<description><![CDATA[Short Introduction
This article could have appeared approximately 1,5 year ago, but at that time I didn&#8217;t have any free time nor ability to publish anything connected with it, because source code parts described here were used in several commercial projects.
The main problem of PHP multithreading engine implementation is a shared piece of memory that usually [...]]]></description>
			<content:encoded><![CDATA[<h3>Short Introduction</h3>
<p>This article could have appeared approximately 1,5 year ago, but at that time I didn&#8217;t have any free time nor ability to publish anything connected with it, because source code parts described here were used in several commercial projects.</p>
<p>The main problem of PHP multithreading engine implementation is a shared piece of memory that usually vary from one OS to another. This shared memory used by threads in order to talk to each other, get tasks and commands, give back results and use other shared Input/Output resources if needed. As PHP could be used on at least two platforms (Linux, Windows or even more [1]) which have different memory management, structure, etc. from the Programmers point of view. This fact lowers the chances to make a multiplatform PHP multithreading engine.</p>
<h3>Related Articles and Blog posts on PHP Multithreading</h3>
<p>Several quite strong attempts in the direction of PHP Multithreading are: PHP Threader &#8211; Multithreading-like Functionality in PHP [12], Emulate threads using separate HTTP requests [13] and Improved Thread Simulation Class for PHP [14].<br />
At the first glance the first one [12] seems to be complete and fine, but in closer look (analysis of code and examples) there are some serious drawbacks exist, namely:</p>
<ol>
<li>There is no distribution of work and a single job would not divide between threads. Work is done by executing in parallel PHP files (threads) that make separate jobs. However multithreading considers single or several jobs equally divided between threads and are being calculated by several threads concurrently.</li>
<li>Shared memory is actually a shared file or MySQL database just for sending messages between main thread and other threads, not for distribution of job/tasks and tracing the state etc.</li>
<li>There is no proper management of threads Lifecycle.</li>
<li>It seems that huge workload could make problems for calculation and shared data using such approach.</li>
<li>The approach does not provide low-level multithreading. Due to lack of automatic management of job distribution, it is both hard and increases probability of bugs to implement applications that use more than 2-3 threads at once using this package.</li>
</ol>
<p>What about the second one [13]? The Author claims <em>&#8220;This package provides an alternative solution that consists in sending multiple HTTP requests to the same Web server on which PHP is running&#8221;</em>. The third one [14] is also based on HTTP requests. The approach is reasonable, but both solutions lack of thread management, job distribution etc. They do the first part (start several threads and get response from them) very well, but shared piece of memory, full thread control, distribution of jobs/tasks, etc. is missing. It is also questionable would they work with huge amounts of data.</p>
<p>Other previous articles on the topic of PHP Multithreading (see [2],[3],[4]) mostly provide information about using, for example, forking (see [8] for pcntl_fork) in Linux or curling (see [9] for curl) in windows etc. They still do not provide full Multithreading solution.<br />
Additionally there were a lot of attempts to realize PHP Multithreading [10], [11].<br />
Reading the articles Straight away several questions arise. How would we track life-cycle of threads? What happens if some of the threads would hang on or crash unexpectedly without any notice?</p>
<h3>New Approach!</h3>
<p>Inspired by:</p>
<ul>
<li>These two articles [5],[6] (in my opinion it is the best approach in this case)</li>
</ul>
<ul>
<li>TTU (www.ttu.ee) university course Informatics II by PhD Innar Liiv, where small Grid-like computation application on C++ was a topic of my project.</li>
</ul>
<p>and</p>
<ul>
<li>a strong need for a really working PHP Multithreading engine mostly for Information Extraction from Web purposes.</li>
</ul>
<p>As a result some ideas came how to improve the code and concept provided in articles above [5] and [6], make the algorithm more automated, universal, clean and less complicated.</p>
<p><strong>What kind of improvements will be added?:</strong></p>
<p><strong><span style="text-decoration: underline;">Improvement I: Shared memory</span></strong></p>
<p>As a shared piece of memory any database (MySql, Postgres, Oracle, etc.) could be used<br />
We need only two tables (please see a picture below), one for messaging/tasking/command called cmd and another one for tracking the life-cycle of threads called threads.</p>
<div id="attachment_80" class="wp-caption aligncenter" style="width: 410px"><a href="http://anton.vedeshin.com/wp-content/uploads/2009/07/cmd_and_thread_tables.PNG"><img class="size-full wp-image-80" title="cmd_and_thread_tables" src="http://anton.vedeshin.com/wp-content/uploads/2009/07/cmd_and_thread_tables.PNG" alt="cms and thread tables" width="400" height="171" /></a><p class="wp-caption-text">cms and thread tables</p></div>
<p><strong>Table: Cmd</strong><br />
<strong>cmd_id </strong>– just a primary key<br />
<strong>proc_id</strong> – ID of a thread<br />
<strong>cmd</strong> – command given (for example: calculate, exit, etc.)<br />
<strong>param</strong> – additional parameters needed for the command, usually a serialized PHP object<br />
<strong>result</strong> – stored after the command was done, usually a serialized PHP object.<br />
<strong>done</strong> – flag for Main thread, was command/task done or not, helps to calculate results and reassign task to another thread if current is not responding.<br />
<strong>datestamp</strong> – just and time and date</p>
<p><strong>Table: Threads</strong><br />
<strong>threads_id</strong> – just a primary key<br />
<strong>proc_id</strong> – ID of a thread<br />
<strong>last_beat</strong> – last timestamp when Thread was alive<br />
<strong>busy</strong> – flag, is it busy or not<br />
<strong>state</strong> – parameter that represents state (for example exit, ready, etc.)</p>
<p>Personally I used MSSQL Server, but tables and commands are ANSI SQL compatible, that means there is no problem using other databases like MySQL, Postgres, Oracle etc. (further in the article you will see that for the communication with database the EZsql DB abstraction class [7] is used, so it is easy to change the DB engine. EZsql is not the best solution/abstraction class and of course you can use your own connector to access the Database).</p>
<p><strong>MSSQL DDL of cmd and thread tables:</strong></p>
<div class="codecolorer-container sql " style="overflow:auto;white-space:nowrap;width:435px"><div class="sql codecolorer" style="font-family:Monaco,Lucida Console,monospace"><span class="kw1">CREATE</span> <span class="kw1">TABLE</span> <span class="br0">&#91;</span>threads<span class="br0">&#93;</span> <span class="br0">&#40;</span><br />
<span class="br0">&#91;</span>thread_id<span class="br0">&#93;</span> <span class="br0">&#91;</span>bigint<span class="br0">&#93;</span> IDENTITY <span class="br0">&#40;</span><span class="nu0">1</span><span class="sy0">,</span> <span class="nu0">1</span><span class="br0">&#41;</span> <span class="kw1">NOT</span> <span class="kw1">NULL</span> <span class="sy0">,</span><br />
<span class="br0">&#91;</span>proc_id<span class="br0">&#93;</span> <span class="br0">&#91;</span>int<span class="br0">&#93;</span> <span class="kw1">NULL</span> <span class="sy0">,</span><br />
<span class="br0">&#91;</span>last_beat<span class="br0">&#93;</span> <span class="br0">&#91;</span>varchar<span class="br0">&#93;</span> <span class="br0">&#40;</span><span class="nu0">50</span><span class="br0">&#41;</span> <span class="kw1">NULL</span> <span class="sy0">,</span><br />
<span class="br0">&#91;</span>busy<span class="br0">&#93;</span> <span class="br0">&#91;</span>int<span class="br0">&#93;</span> <span class="kw1">NULL</span> <span class="sy0">,</span><br />
<span class="br0">&#91;</span>state<span class="br0">&#93;</span> <span class="br0">&#91;</span>int<span class="br0">&#93;</span> <span class="kw1">NULL</span><br />
<span class="br0">&#41;</span> <span class="kw1">ON</span> <span class="br0">&#91;</span><span class="kw1">PRIMARY</span><span class="br0">&#93;</span><br />
<br />
<span class="kw1">CREATE</span> <span class="kw1">TABLE</span> <span class="br0">&#91;</span>cmd<span class="br0">&#93;</span> <span class="br0">&#40;</span><br />
<span class="br0">&#91;</span>cmd_id<span class="br0">&#93;</span> <span class="br0">&#91;</span>bigint<span class="br0">&#93;</span> IDENTITY <span class="br0">&#40;</span><span class="nu0">1</span><span class="sy0">,</span> <span class="nu0">1</span><span class="br0">&#41;</span> <span class="kw1">NOT</span> <span class="kw1">NULL</span> <span class="sy0">,</span><br />
<span class="br0">&#91;</span>proc_id<span class="br0">&#93;</span> <span class="br0">&#91;</span>int<span class="br0">&#93;</span> <span class="kw1">NULL</span> <span class="sy0">,</span><br />
<span class="br0">&#91;</span>cmd<span class="br0">&#93;</span> <span class="br0">&#91;</span>varchar<span class="br0">&#93;</span> <span class="br0">&#40;</span><span class="nu0">255</span><span class="br0">&#41;</span> <span class="kw1">NULL</span> <span class="sy0">,</span><br />
<span class="br0">&#91;</span>param<span class="br0">&#93;</span> <span class="br0">&#91;</span>text<span class="br0">&#93;</span> <span class="kw1">NULL</span> <span class="sy0">,</span><br />
<span class="br0">&#91;</span>result<span class="br0">&#93;</span> <span class="br0">&#91;</span>text<span class="br0">&#93;</span> <span class="kw1">NULL</span> <span class="sy0">,</span><br />
<span class="br0">&#91;</span>done<span class="br0">&#93;</span> <span class="br0">&#91;</span>int<span class="br0">&#93;</span> <span class="kw1">NULL</span> <span class="sy0">,</span><br />
<span class="br0">&#91;</span>datestamp<span class="br0">&#93;</span> <span class="br0">&#91;</span>datetime<span class="br0">&#93;</span> <span class="kw1">NULL</span><br />
<span class="br0">&#41;</span> <span class="kw1">ON</span> <span class="br0">&#91;</span><span class="kw1">PRIMARY</span><span class="br0">&#93;</span> TEXTIMAGE_ON <span class="br0">&#91;</span><span class="kw1">PRIMARY</span><span class="br0">&#93;</span></div></div>
<p><strong>Example of data</strong><br />
Data shown is for application with 30 threads.<br />
<a class="wpGallery" href="http://anton.vedeshin.com/wp-content/uploads/2009/07/cmd_data.PNG" target="_blank">Cmd table</a>, <a href="http://anton.vedeshin.com/wp-content/uploads/2009/07/threads_data.PNG" target="_blank">Threads table</a>.</p>
<p><span style="text-decoration: underline;"><strong>Improvement II: Message broker</strong></span></p>
<p>Message/task/commands brokering is done through Database together with the functions in the code:<br />
Loop “THREADS main cycle”</p>
<div class="codecolorer-container php " style="overflow:auto;white-space:nowrap;width:435px"><div class="php codecolorer" style="font-family:Monaco,Lucida Console,monospace"><span class="kw1">do</span><span class="br0">&#123;</span><br />
…<br />
<span class="br0">&#125;</span><span class="kw1">while</span><span class="br0">&#40;</span><span class="re0">$q</span><span class="br0">&#41;</span><span class="sy0">;</span></div></div>
<p>and</p>
<div class="codecolorer-container php " style="overflow:auto;white-space:nowrap;width:435px"><div class="php codecolorer" style="font-family:Monaco,Lucida Console,monospace"><span class="kw2">function</span> tell<span class="br0">&#40;</span><span class="re0">$thought</span><span class="sy0">,</span> <span class="re0">$params</span> <span class="sy0">=</span> <span class="kw2">NULL</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>…<span class="br0">&#125;</span></div></div>
<p>and finally</p>
<div class="codecolorer-container php " style="overflow:auto;white-space:nowrap;width:435px"><div class="php codecolorer" style="font-family:Monaco,Lucida Console,monospace"><span class="co2">### Get results block</span></div></div>
<p>please see main.php and Thread.php for detailed information</p>
<p><span style="text-decoration: underline;"><strong>Improvement III: thread life-cycle management</strong></span></p>
<p>We need to know which of the threads are ready for processing, which are busy, which are ended the processing and asking for termination, etc. Message broker is used to give “vital” commands to threads.<br />
This is a job of the following functions:</p>
<div class="codecolorer-container php " style="overflow:auto;white-space:nowrap;width:435px"><div class="php codecolorer" style="font-family:Monaco,Lucida Console,monospace"><span class="kw2">function</span> isActive <span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><span class="br0">&#125;</span><br />
<span class="kw2">function</span> isBusy<span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span><span class="br0">&#125;</span></div></div>
<p>please see Thread.php</p>
<p>The source code is well annotated and commented, so have a look inside.</p>
<p>Download: <a href="http://anton.vedeshin.com/wp-content/uploads/2009/07/PHP_Multithreading_sourcecode.zip">PHP_Multithreading_sourcecode_v1.0</a></p>
<h3>Performance and Evaluation: 5 hours vs 3 minutes</h3>
<p>One of the applications where the code was used was a massive download and generating thumbnails from pictures.<br />
<span style="text-decoration: underline;">Work volume</span>: ~3000 JPEG pictures, 0.5-1.5 Mb each<br />
<span style="text-decoration: underline;">Hardware used</span>: 1.4 Ghz Pentium 4 processor, 2 GB RAM, IIS 5.5 etc.<br />
<span style="text-decoration: underline;">Internet connection</span>: 8/8 Mbit synchronous connection<br />
Linear download and resizing would take <strong>5 hours</strong><br />
Multithreaded solution with 20 threads took <strong>less than 3 minutes</strong>.</p>
<h3>Conclusion</h3>
<p><strong>What definitely distinguishes PHP Multithreading engine proposed in current article?</strong></p>
<ol>
<li>Multiplatform or platform independent</li>
<li>Highly customizable and lightweight</li>
<li>There is no redundant code inside; only best in this case technologies are used.</li>
<li>Works both in the Web and as CLI</li>
<li>Code is OOP, well commented, easy to understand, do not depend on any other frameworks or drivers.</li>
<li>As a thread shared memory any Database Engine could be used (MySQL, Postgres, Oracle, MSSQL etc.). It provides the ability to use full power of RDBMS.</li>
</ol>
<p>Any comments, questions and suggestions about the article are highly appreciated.<br />
The next step will be creating PHP Map/Reduce similar implementation and hosting it on <a href="http://www.sourceforge.net/phpmapreduce" target="_blank"></a><a href="http://sourceforge.net/projects/phpmapreduce/" target="_blank">http://sourceforge.net/projects/phpmapreduce/</a>.</p>
<h1 style="text-align: center;">Download Source</h1>
<p style="text-align: center;"> </p>
<p style="text-align: center;"><a href="http://anton.vedeshin.com/wp-content/uploads/2009/07/PHP_Multithreading_sourcecode.zip"><img class="aligncenter size-full wp-image-109" title="download-button" src="http://anton.vedeshin.com/wp-content/uploads/2009/07/download-button.PNG" alt="download-button" width="103" height="102" /></a></p>
<h3>References</h3>
<p>[1] Supported platforms by PHP <a href="http://wiki.php.net/platforms" target="_blank">http://wiki.php.net/platforms</a><br />
[2] Sonic server <a href="http://dev.pedemont.com/sonic/" target="_blank">http://dev.pedemont.com/sonic/</a><br />
[3] Process Forking with PHP <a href="http://www.electrictoolbox.com/article/php/process-forking/" target="_blank">http://www.electrictoolbox.com/article/php/process-forking/</a><br />
[4] Multithreading in PHP with CURL <a href="http://www.ibuildings.nl/blog/archives/811-Multithreading-in-PHP-with-CURL.html" target="_blank">http://www.ibuildings.nl/blog/archives/811-Multithreading-in-PHP-with-CURL.html</a><br />
[5] Multi-threading strategies in PHP <a href="http://www.alternateinterior.com/2007/05/multi-threading-strategies-in-php.html" target="_blank">http://www.alternateinterior.com/2007/05/multi-threading-strategies-in-php.html</a><br />
[6] Communicating with threads in PHP <a href="http://www.alternateinterior.com/2007/05/communicating-with-threads-in-php.html" target="_blank">http://www.alternateinterior.com/2007/05/communicating-with-threads-in-php.html</a><br />
[7] EZsql DB abstraction class <a href="http://www.woyano.com/jv/ezsql" target="_blank">http://www.woyano.com/jv/ezsql</a><br />
[8] PHP Function pcntl_fork <a href="http://www.php.net/manual/en/function.pcntl-fork.php" target="_blank">http://www.php.net/manual/en/function.pcntl-fork.php</a><br />
[9] PHP library Curl <a href="http://www.php.net/curl" target="_blank">http://www.php.net/curl</a><br />
[10] Attempt to make PHP Multithreading Tutorial<br />
<a href="http://phpmultithreaddaemon.blogspot.com/2007/09/introduction.html" target="_blank">http://phpmultithreaddaemon.blogspot.com/2007/09/introduction.html</a><br />
[11] Discussions about PHP Multithreading <a href="http://webforumz.com/php/12595-multithreaded-php.htm" target="_blank">http://webforumz.com/php/12595-multithreaded-php.htm</a><br />
[12] PHP Threader<br />
MultiThreading-like Functionality in PHP <a href="http://www.phpclasses.org/browse/package/4082.html" target="_blank">http://www.phpclasses.org/browse/package/4082.html</a><br />
[13] Emulate threads using separate HTTP requests <a href="http://www.phpclasses.org/browse/package/3953.html" target="_blank">http://www.phpclasses.org/browse/package/3953.html</a><br />
[14] Improved Thread Simulation Class for PHP <a href="http://w-shadow.com/blog/2008/05/24/improved-thread-simulation-class-for-php/" target="_blank">http://w-shadow.com/blog/2008/05/24/improved-thread-simulation-class-for-php/</a></p>
]]></content:encoded>
			<wfw:commentRss>http://anton.vedeshin.com/articles/lightweight-and-multiplatform-php-multithreading-engine/feed</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
	</channel>
</rss>
