<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: WURFL Lightweight for PHP</title>
	<atom:link href="http://www.thisismobility.com/blog/2006/06/21/wurfl-lightweight-for-php/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.thisismobility.com/blog/2006/06/21/wurfl-lightweight-for-php/</link>
	<description>Ripping mobility from the clutches of telecom</description>
	<pubDate>Fri, 21 Nov 2008 13:30:58 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.3</generator>
		<item>
		<title>By: Chris Abbott</title>
		<link>http://www.thisismobility.com/blog/2006/06/21/wurfl-lightweight-for-php/#comment-24874</link>
		<dc:creator>Chris Abbott</dc:creator>
		<pubDate>Sun, 30 Jul 2006 09:51:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.thisismobility.com/blog/?p=141#comment-24874</guid>
		<description>Apart from all the UAProfiles, there's a new approach to the device detection problem in the "DetectRight" link to have a play with. You might also want to check out what happened at the W3C Device Description Workshop in Madrid in July to see how the industry proposes to address the problem of everyone writing their own device repositories :)

Chris</description>
		<content:encoded><![CDATA[<p>Apart from all the UAProfiles, there&#8217;s a new approach to the device detection problem in the &#8220;DetectRight&#8221; link to have a play with. You might also want to check out what happened at the W3C Device Description Workshop in Madrid in July to see how the industry proposes to address the problem of everyone writing their own device repositories :)</p>
<p>Chris</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: miker</title>
		<link>http://www.thisismobility.com/blog/2006/06/21/wurfl-lightweight-for-php/#comment-17482</link>
		<dc:creator>miker</dc:creator>
		<pubDate>Mon, 10 Jul 2006 23:17:35 +0000</pubDate>
		<guid isPermaLink="false">http://www.thisismobility.com/blog/?p=141#comment-17482</guid>
		<description>Ooo, I didn't know about the UAProf to WURFL script, I'll have to look that up. Thanks!</description>
		<content:encoded><![CDATA[<p>Ooo, I didn&#8217;t know about the UAProf to WURFL script, I&#8217;ll have to look that up. Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Strangiato</title>
		<link>http://www.thisismobility.com/blog/2006/06/21/wurfl-lightweight-for-php/#comment-17278</link>
		<dc:creator>Strangiato</dc:creator>
		<pubDate>Mon, 10 Jul 2006 16:27:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.thisismobility.com/blog/?p=141#comment-17278</guid>
		<description>Don't have speed figures for you atm.  I don't actually use WURFL at my current position, but I am thinking of using it for a small personal project and was debating between using the Java vs PHP for that.  I noticed the warnings about the speed of the PHP version and decided to just take a look and hack away for the heck of it.
  As for an adaptive WURFL, what has worked in the past for me is to record once for each unique user-agent during the day, the full set of HTTP headers.  As a nightly process check for unknown user-agents.  If a UAProf is given, use that to generate device information if possible, as a last resort use the HTTP accept headers to update device information.  A UAProf to WURL entry utility is available on the WURFL site, (I think in ruby) but simple enough to do in PHP/Perl/Python.  Then you'd run the wurfl update script if needed.  Not realtime...but then again for the small project i'm working on, realtime device updating isn't critical.</description>
		<content:encoded><![CDATA[<p>Don&#8217;t have speed figures for you atm.  I don&#8217;t actually use WURFL at my current position, but I am thinking of using it for a small personal project and was debating between using the Java vs PHP for that.  I noticed the warnings about the speed of the PHP version and decided to just take a look and hack away for the heck of it.<br />
  As for an adaptive WURFL, what has worked in the past for me is to record once for each unique user-agent during the day, the full set of HTTP headers.  As a nightly process check for unknown user-agents.  If a UAProf is given, use that to generate device information if possible, as a last resort use the HTTP accept headers to update device information.  A UAProf to WURL entry utility is available on the WURFL site, (I think in ruby) but simple enough to do in PHP/Perl/Python.  Then you&#8217;d run the wurfl update script if needed.  Not realtime&#8230;but then again for the small project i&#8217;m working on, realtime device updating isn&#8217;t critical.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: miker</title>
		<link>http://www.thisismobility.com/blog/2006/06/21/wurfl-lightweight-for-php/#comment-15781</link>
		<dc:creator>miker</dc:creator>
		<pubDate>Sat, 08 Jul 2006 15:51:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.thisismobility.com/blog/?p=141#comment-15781</guid>
		<description>I actually had the hash as a two level structure first, and then moved it off to individual files. Even with an accelerator in place to cache a compiled version of the agents files, reading in the entire array structure bloated the runtime some. For the test set I had, keeping the array parts distinct helped out.

How much of a difference did #2 make for you?  I'm considering doing that, but I'm not sure if the result is just going to be that the block cache layer at the OS level just needs to cache more files.  With the existing ssytem there is the benefit of having a smaller on disk working set to cache.

I actually ended up using a direct check before scanning, but I didn't bother posting it. Turns out it only hits an exact match 30% of the time and didn't make much of a difference.  I'm considering trying to make WURFL adaptive, so that it learns new User Agents as it sees them.  That was one of the benefits of agent2id, even though the cache of capabilities detracted from the lookup benefits for me. I'm thinking of putting something of the sort back in there.</description>
		<content:encoded><![CDATA[<p>I actually had the hash as a two level structure first, and then moved it off to individual files. Even with an accelerator in place to cache a compiled version of the agents files, reading in the entire array structure bloated the runtime some. For the test set I had, keeping the array parts distinct helped out.</p>
<p>How much of a difference did #2 make for you?  I&#8217;m considering doing that, but I&#8217;m not sure if the result is just going to be that the block cache layer at the OS level just needs to cache more files.  With the existing ssytem there is the benefit of having a smaller on disk working set to cache.</p>
<p>I actually ended up using a direct check before scanning, but I didn&#8217;t bother posting it. Turns out it only hits an exact match 30% of the time and didn&#8217;t make much of a difference.  I&#8217;m considering trying to make WURFL adaptive, so that it learns new User Agents as it sees them.  That was one of the benefits of agent2id, even though the cache of capabilities detracted from the lookup benefits for me. I&#8217;m thinking of putting something of the sort back in there.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Strangiato</title>
		<link>http://www.thisismobility.com/blog/2006/06/21/wurfl-lightweight-for-php/#comment-15528</link>
		<dc:creator>Strangiato</dc:creator>
		<pubDate>Fri, 07 Jul 2006 23:01:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.thisismobility.com/blog/?p=141#comment-15528</guid>
		<description>I like your ideas for speeding up the WURFL for php. 
Using only multichache layout for the files and getting rid of the agent2id file really cleans up the code quite a bit. 

I used your approach with a couple of tweaks.  

1.  Instead of breaking the user-agent to wurfl id into multiple little cache files based on the first to characters of the user agent, I simply made that hash structure 2 levels deep. 
  The first level contains keys which contain the value of the user agent up to the first slash.  That key then points to a hash containing the full user_agent to wurfl ids for agents that start with similar values.   
  This is much simpler as there is only one file to worry about and you don't have to scan the ENTIRE key set of the user agents.  You can quickly tell if a similar agent exists, and if it does you only have to scan a handful looking for the best one.  (Which brings up another point, first check if there is an exact match using isset() before trying the linear scan looking for the longest match.)

2.  I went ahead and stored in each device cache file the entire capability set.  Its trivial during parsing to look these up, and when loading a device's capabilities at runtime you only have to load a single file.  The resultant files are all around 16k.  (An extra benefit to this is that it is easily adaptable to storing these as blobs in a database.)

3.  Minor tweaks that probably aren't worth the speedup.  
3a - Modified the matchlen() function to scan each user_agent character by character rather than repeatedly calling substr().  Theoretically scanning a string once is faster than scanning a string over and over (1+2+3+4+5...etc)
3b - getDeviceCapability() seemed to be doing something odd, it was linearly scanning all the capabilities looking for a key match.  The structure is a hash, why not use isset() and directlly return the value requested if it exists, much faster.</description>
		<content:encoded><![CDATA[<p>I like your ideas for speeding up the WURFL for php.<br />
Using only multichache layout for the files and getting rid of the agent2id file really cleans up the code quite a bit. </p>
<p>I used your approach with a couple of tweaks.  </p>
<p>1.  Instead of breaking the user-agent to wurfl id into multiple little cache files based on the first to characters of the user agent, I simply made that hash structure 2 levels deep.<br />
  The first level contains keys which contain the value of the user agent up to the first slash.  That key then points to a hash containing the full user_agent to wurfl ids for agents that start with similar values.<br />
  This is much simpler as there is only one file to worry about and you don&#8217;t have to scan the ENTIRE key set of the user agents.  You can quickly tell if a similar agent exists, and if it does you only have to scan a handful looking for the best one.  (Which brings up another point, first check if there is an exact match using isset() before trying the linear scan looking for the longest match.)</p>
<p>2.  I went ahead and stored in each device cache file the entire capability set.  Its trivial during parsing to look these up, and when loading a device&#8217;s capabilities at runtime you only have to load a single file.  The resultant files are all around 16k.  (An extra benefit to this is that it is easily adaptable to storing these as blobs in a database.)</p>
<p>3.  Minor tweaks that probably aren&#8217;t worth the speedup.<br />
3a - Modified the matchlen() function to scan each user_agent character by character rather than repeatedly calling substr().  Theoretically scanning a string once is faster than scanning a string over and over (1+2+3+4+5&#8230;etc)<br />
3b - getDeviceCapability() seemed to be doing something odd, it was linearly scanning all the capabilities looking for a key match.  The structure is a hash, why not use isset() and directlly return the value requested if it exists, much faster.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
