<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Blogatron</title>
	<atom:link href="http://www.vargatron.com/blogatron/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.vargatron.com/blogatron</link>
	<description>TCB All Semester Long.</description>
	<lastBuildDate>Tue, 09 Mar 2010 19:26:10 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Workin on some Vis!</title>
		<link>http://www.vargatron.com/blogatron/2010/02/28/workin-on-some-vis/</link>
		<comments>http://www.vargatron.com/blogatron/2010/02/28/workin-on-some-vis/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 04:53:20 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Prototypes]]></category>
		<category><![CDATA[Thesis]]></category>

		<guid isPermaLink="false">http://www.vargatron.com/blogatron/?p=425</guid>
		<description><![CDATA[&#8216;ve been working all weekend on putting down a solid visualization of baseball games that gives an informative overview of exactly what happened in a game in the least amount of space available. I want to give a variety of ways to view a game in order to get the most out of it, but [...]]]></description>
			<content:encoded><![CDATA[<p>&#8216;ve been working all weekend on putting down a solid visualization of baseball games that gives an informative overview of exactly what happened in a game in the least amount of space available. I want to give a variety of ways to view a game in order to get the most out of it, but first and foremost I want to provide a simplified overview of all game events that shows the flow of the game and allows people to see exactly what happened.</p>
<p>I started by trying to classify events and think of how that I can rank them each in a way that will allow me to show them in a linear way that both unifies them per team as well as shows the magnitude of each event. In this idea I thought about the major possible events and ranked them according to the impact they have on the game, form the perspective of the batting team.</p>
<p>Tier 5: Homeruns,<br />
Tier 4: Triples<br />
Tier 3: Doubles<br />
Tier 2: Singles<br />
Tier 1: Walks/Stolen Bases<br />
Tier 0: Outs</p>
<p>I had to revise this several times after experimenting with it, mostly due to issues wtih classifying walks/stolen bases and singles. At first I thought that I could get away with putting them in the same tier, since they all result in the player moving up one base. I then realized that there is a clear difference between the two. While walks and stolen bases both advance the runner towards scoring position on the diamond, singles put the ball in play, which can result in RBIs. Walks can sometimes result in an RBI if the bases are loaded but even this is restricted to one run, while a single can drive in up to three runs.</p>
<p>After coming up with this classification, I began to work on a circular visualization of events that classified them according to a number of criteria. While in concept this made sense to me, in reality it turned out to look somewhat interesting but really show a lot of nothing with a  simple look. I also attempted to create an organic shape from this visualization just to see what it looked like, and while this too proved interesting I dont&#8217; think it was very successful in showing anything.</p>
<p><a href="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-1.png"><img class="alignnone size-medium wp-image-435" title="PastedGraphic-1" src="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-1-300x269.png" alt="PastedGraphic-1" width="300" height="269" /></a></p>
<p><a href="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-2.png"><img class="alignnone size-medium wp-image-436" title="PastedGraphic-2" src="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-2-300x271.png" alt="PastedGraphic-2" width="300" height="271" /></a></p>
<p>I then moved on to the next logical step, visualizing things linearly. I continued with my key of dashes and colors, and while linearly things made a bit more sense it was still not working very well IMO. I experimented with representing outs as negative events at this point as well but I don&#8217;t think it read the way I intened.</p>
<p><a href="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-3.png"><img class="alignnone size-medium wp-image-437" title="PastedGraphic-3" src="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-3-300x52.png" alt="PastedGraphic-3" width="300" height="52" /></a></p>
<p><a href="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-4.png"><img class="alignnone size-medium wp-image-438" title="PastedGraphic-4" src="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-4-300x52.png" alt="PastedGraphic-4" width="300" height="52" /></a></p>
<p>At this point I also created a more organic flow diagram from these charts. I think this actually worked out really well and is a great overall representation of the flow of the game. While it doesn&#8217;t really show exactly what I wanted from the first overall view of the game a user is presented with, I think it is a great sub-view and something that I&#8217;m going to continue to investigate</p>
<p><a href="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-6.png"><img class="alignnone size-medium wp-image-439" title="PastedGraphic-6" src="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-6-300x57.png" alt="PastedGraphic-6" width="300" height="57" /></a></p>
<p>This is where I realized I was sorta getting caught up in doing the same thing over and over, so I decided to give it a rest and move on the next day.</p>
<p>Today I started working again and had some major breakthroughs. I spent a lot of time working on various visualizations, trying to bring down the amount of data ink used (as per reading Edward Tufte) and began to really simplify everything. I did a lot of different experimenting, and came up with a visualization I think finally works pretty well. I did some quick testing with a few casual baseball fans I know (obviously need to do more user testing but something is better than nothing and I just finished it!) and I was pleased that they could easily recreate the narrative of a game themselves after a very brief explanation which wasn&#8217;t repeated.</p>
<p><a href="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-8.png"><img class="alignnone size-medium wp-image-440" title="PastedGraphic-8" src="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-8-300x83.png" alt="PastedGraphic-8" width="300" height="83" /></a></p>
<p><a href="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-9.png"><img class="alignnone size-medium wp-image-441" title="PastedGraphic-9" src="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-9-300x156.png" alt="PastedGraphic-9" width="300" height="156" /></a></p>
<p><a href="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-10.png"><img class="alignnone size-medium wp-image-442" title="PastedGraphic-10" src="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-10-300x116.png" alt="PastedGraphic-10" width="300" height="116" /></a></p>
<p><a href="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-11.png"><img class="alignnone size-medium wp-image-443" title="PastedGraphic-11" src="http://www.vargatron.com/blogatron/wp-content/uploads/2010/02/PastedGraphic-11-300x129.png" alt="PastedGraphic-11" width="300" height="129" /></a></p>
<p>All of this needs more work I know, but I put in a crapload of time this weekend and I think that I am definitely making progress. I also worked with a real data set vs just making things up, which was a bit of a pain in the ass at first but really helped me a lot in the long run. I hsould have been working like this before and I&#8217;m glad I am doing it now.</p>
<p>Thats it for now, I also did a good amount of work on the otehr sections of the application but I don&#8217;t feel like they are ready to show yet so that will be another post.</p>
<p>If you read this please check everything out and give me some feedback!</p>
<p>Thanks,</p>
<p>-Steve</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vargatron.com/blogatron/2010/02/28/workin-on-some-vis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting MySQLdb up and running on Snow Leopard with MAMP</title>
		<link>http://www.vargatron.com/blogatron/2010/02/15/getting-mysqldb-up-and-running-on-snow-leopard-with-mamp/</link>
		<comments>http://www.vargatron.com/blogatron/2010/02/15/getting-mysqldb-up-and-running-on-snow-leopard-with-mamp/#comments</comments>
		<pubDate>Tue, 16 Feb 2010 00:57:13 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Prototypes]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Thesis]]></category>

		<guid isPermaLink="false">http://www.vargatron.com/blogatron/?p=420</guid>
		<description><![CDATA[OK this is another post that is more for my own future sanity than anything else. This has been a huge pain in the ass for me many times and I want to document the steps I took to get this working.
First, this is a good general overview
http://cd34.com/blog/programming/python/mysql-python-and-snow-leopard/
So I downloaded/installed MySQL, then I had to [...]]]></description>
			<content:encoded><![CDATA[<p>OK this is another post that is more for my own future sanity than anything else. This has been a huge pain in the ass for me many times and I want to document the steps I took to get this working.</p>
<p>First, this is a good general overview<br />
http://cd34.com/blog/programming/python/mysql-python-and-snow-leopard/</p>
<p>So I downloaded/installed MySQL, then I had to add it to my PATH var to get the MySQLdb install to work.</p>
<p>//Open With textmate<br />
mate ~/.profile</p>
<p>export PATH=&#8221;/usr/local/bin:/usr/local/sbin:/usr/local/mysql/bin:$PATH&#8221;</p>
<p>then make sure you update your profile or close and open a new terminal window<br />
source ~/.profile</p>
<p>that will let the MySQLdb install work. Then when you run Python and try to use MySQL db you&#8217;ll get errors. What you need to do is the following:</p>
<p>change your PATH to export PATH=/Applications/MAMP/Library/bin:$PATH</p>
<p>which links to MAMP mysql, then run this command </p>
<p>sudo ln -s /Applications/MAMP/tmp/mysql/mysql.sock /tmp/mysql.sock</p>
<p>Then you should be good to go. Might be able to skip the first step but I don&#8217;t know if the MySQL that comes with mamp is 64bit and you need 64 bit for snow leopards install of Python..</p>
<p>Why is python such a pain in the ass?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vargatron.com/blogatron/2010/02/15/getting-mysqldb-up-and-running-on-snow-leopard-with-mamp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Considering baseball data API</title>
		<link>http://www.vargatron.com/blogatron/2010/02/13/considering-baseball-data-api/</link>
		<comments>http://www.vargatron.com/blogatron/2010/02/13/considering-baseball-data-api/#comments</comments>
		<pubDate>Sat, 13 Feb 2010 22:47:32 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Context and Domains (Research)]]></category>
		<category><![CDATA[Thesis]]></category>

		<guid isPermaLink="false">http://www.vargatron.com/blogatron/?p=418</guid>
		<description><![CDATA[So I have been doing a lot of research about the data I&#8217;m going to need for thesis, and where I should get this. The two main sources of data I am looking at are Retrosheet, which I&#8217;ve previously mentioned extensively, and the unofficial MLB xml sources which are used by the official MLB gameday [...]]]></description>
			<content:encoded><![CDATA[<p>So I have been doing a lot of research about the data I&#8217;m going to need for thesis, and where I should get this. The two main sources of data I am looking at are Retrosheet, which I&#8217;ve previously mentioned extensively, and the unofficial MLB xml sources which are used by the official MLB gameday app.</p>
<p>Both of these resources have their advantages/disadvantages, and neither completely suits my needs. I am starting to think that in order to get what I want, I will have to create a hybrid of the two, but the big question is how exactly will I do this? Here are my thoughts on the good and bad about each source:</p>
<p>Retrosheet:<br />
Great resource, a pain to get into a database but once its there its very compete<br />
Data going back to 1951<br />
Self contained, doesn&#8217;t rely on resources that may be unavailable in the future<br />
Doesn&#8217;t contain any current season data<br />
No great way to get data, have to write custom XML to get data in a usable form</p>
<p>MLB Data:<br />
Completely thorough<br />
Provided in XML format that is really well executed<br />
Updated daily for current games<br />
Not official, no promises that it will be there in the future<br />
Questionable as far as a data source/retrieving data repeatedly.</p>
<p>I have been working on creating an API for Retrosheet games that returns complete data, and I&#8217;ve made some headway but its largely a work in progress. After finding the XML based MLB data I can see that they have already tackled this and have the data in a beautiful format that is pretty much exactly what I wanted to make. They do not however have this data stored in a database or any easy way to request a game in a traditional way. </p>
<p>I think that my strategy for this problem will be to combine both resources into a best of both world&#8217;s situation that I can make freely available to anyone. I will adapt the MLB XML format and parse all Retrosheet games to this, taking advantage of the format already established by MLB.com as a standard and saving myself some work. I will then write code that does the opposite with the MLB.com data, checking every day of a season and breaking down their XML into data that can be put into the retrosheet database. This will likely be the hardest part of the process but I think I can figure it out. This will provide me with a complete retrosheet database with current game data and an XML format that will provide a full summary of a game in an easy to read way. </p>
<p>Hopefully this works out for me, if I can do this and provide access to it through my web server I think it will be exciting and a great contribution to the baseball data community.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vargatron.com/blogatron/2010/02/13/considering-baseball-data-api/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Simple Thesis Prototype</title>
		<link>http://www.vargatron.com/blogatron/2009/12/30/simple-thesis-prototype/</link>
		<comments>http://www.vargatron.com/blogatron/2009/12/30/simple-thesis-prototype/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 22:17:55 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.vargatron.com/blogatron/?p=415</guid>
		<description><![CDATA[Super Simple Prototype for Thesis.
Game data for MLB seasons 1952-2008 acquired from the site retrosheet.com, parsed into CSV then MySQL formats, read by PHP and converted to XML, then read into openFrameworks.
The size of the ring denotes the length of an inning, the blue represents the home team and the green the away team. Dark [...]]]></description>
			<content:encoded><![CDATA[<p>Super Simple Prototype for Thesis.</p>
<p>Game data for MLB seasons 1952-2008 acquired from the site retrosheet.com, parsed into CSV then MySQL formats, read by PHP and converted to XML, then read into openFrameworks.</p>
<p>The size of the ring denotes the length of an inning, the blue represents the home team and the green the away team. Dark colors represent outs, bright colors represent runs. Grey pieces are events I haven&#8217;t checked for yet (caught stealing,wild pitch, etc).</p>
<p>Super simple audio events are assigned to each plate appearance as well.</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="480" height="360" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://vimeo.com/moogaloop.swf?clip_id=8025906&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=1&amp;color=dd4499&amp;fullscreen=1" /><embed type="application/x-shockwave-flash" width="480" height="360" src="http://vimeo.com/moogaloop.swf?clip_id=8025906&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=1&amp;color=dd4499&amp;fullscreen=1" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://www.vargatron.com/blogatron/2009/12/30/simple-thesis-prototype/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Some recent work</title>
		<link>http://www.vargatron.com/blogatron/2009/12/30/some-recent-work/</link>
		<comments>http://www.vargatron.com/blogatron/2009/12/30/some-recent-work/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 22:16:39 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.vargatron.com/blogatron/?p=412</guid>
		<description><![CDATA[Here are a couple of CV Prototypes I&#8217;ve done in school. This semester has been a really big challenge for me. I feel like I learned a lot but at the same time wasn&#8217;t able to put the time and effort I wanted into any of my projects, so most of them exist in prototype [...]]]></description>
			<content:encoded><![CDATA[<p>Here are a couple of CV Prototypes I&#8217;ve done in school. This semester has been a really big challenge for me. I feel like I learned a lot but at the same time wasn&#8217;t able to put the time and effort I wanted into any of my projects, so most of them exist in prototype form only. Hoping that one day I can expand upon these and create something great if I ever have time&#8230;</p>
<p><strong>Beehive</strong><br />
Working with Lee Meredith and Justin Blinder we created this experiment that allows users to interact with virtual beehives. When a user knocks into the beehive they are swarmed by bees which continue to attack until they leave the video frame. The bees then return to their hives. Created using CV tracking and particle forces.</p>
<p><object width="480" height="360"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=8341484&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=1&amp;color=dd4499&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=8341484&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=1&amp;color=dd4499&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="480" height="360"></embed></object></p>
<p><strong>Puking</strong><br />
We also created this simple app that is a lot of fun. Basically bend over and you puke. Using CV Tracking and detecting which direction the person is bending in using countNonZeroInRegion() function in ofxCvImage.</p>
<p><object width="480" height="360"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=8341631&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=1&amp;color=dd4499&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=8341631&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=1&amp;color=dd4499&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="480" height="360"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://www.vargatron.com/blogatron/2009/12/30/some-recent-work/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Prototype/User Test for Thesis</title>
		<link>http://www.vargatron.com/blogatron/2009/11/29/prototypeuser-test-for-thesis/</link>
		<comments>http://www.vargatron.com/blogatron/2009/11/29/prototypeuser-test-for-thesis/#comments</comments>
		<pubDate>Sun, 29 Nov 2009 21:06:20 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Prototypes]]></category>
		<category><![CDATA[Thesis]]></category>

		<guid isPermaLink="false">http://www.vargatron.com/blogatron/?p=387</guid>
		<description><![CDATA[After a LOT of work/heavy lifting with C++, audio timing and beat spacing, I finally am relieved to have a sizeable prototype for thesis that is a big step towards my intended project. It may not seem like a lot in the form I&#8217;m presenting it, but it involved a lot of planning, coding, and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.vargatron.com/blogatron/wp-content/uploads/2009/11/timedSpacing.png"></a>After a LOT of work/heavy lifting with C++, audio timing and beat spacing, I finally am relieved to have a sizeable prototype for thesis that is a big step towards my intended project. It may not seem like a lot in the form I&#8217;m presenting it, but it involved a lot of planning, coding, and adjustments to my XML in order to create something that I can test.</p>
<p>This test consists of two audio files, both generated live from baseball game information read from a database, parsed to XML, imported into openFrameworks, then associated with certain beats in two different patterns. I associated the same beats with events in each pattern so that they can hopefully be compared in a similar fashion. The beats are as follows:</p>
<ul>
<li>Inning Change: Snare Drum</li>
<li>Top/Bottom of inning: Bass Drum</li>
<li>Out: Kick Drum</li>
<li>Hit: Clapping</li>
</ul>
<p>The first pattern is linear, which means that every plate appearance by a player is mapped to an equally timed beat. This diagram attempts to illustrate what I mean&#8230;</p>
<p><a href="http://www.vargatron.com/blogatron/wp-content/uploads/2009/11/linearBeats.png"><img class="size-full wp-image-396 alignnone" title="linearBeats" src="http://www.vargatron.com/blogatron/wp-content/uploads/2009/11/linearBeats.png" alt="linearBeats" width="484" height="121" /></a></p>
<p>This approach is true to the actual rhythm of the game, and spaces the beats as they would occur in actual time. The audio effect that this gives can be quite random at time, and while it represents the game very well</p>
<p>Here is what this sounds like:</p>
<p>The second approach I took was to use the constants in the game (9 innings/Top and Bottom to each inning) as steady beats, and then use the variable events within these constants (hits, runs, outs, etc) as sections of variable  timing within the the constants. The diagram below attempts to illustrate this:</p>
<p><img class="size-full wp-image-389 alignnone" title="timedSpacing" src="http://www.vargatron.com/blogatron/wp-content/uploads/2009/11/timedSpacing1.png" alt="timedSpacing" width="492" height="170" /></p>
<p>This approach doesn&#8217;t hold as true to the actual rhythm of the game, however I think that the contrast between constant, steady beat patterns and variable events gives both a feeling of the game as well as a better sounding, more rhythmic sound.</p>
<p>Here is what this sounds like:</p>
<p>I&#8217;d appreciate any feedback you have as well as suggestions, so please review these two tracks and let me know what you think in the comments!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vargatron.com/blogatron/2009/11/29/prototypeuser-test-for-thesis/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Digging into the Data</title>
		<link>http://www.vargatron.com/blogatron/2009/11/25/digging-into-the-data/</link>
		<comments>http://www.vargatron.com/blogatron/2009/11/25/digging-into-the-data/#comments</comments>
		<pubDate>Wed, 25 Nov 2009 20:25:41 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Prototypes]]></category>
		<category><![CDATA[Thesis]]></category>

		<guid isPermaLink="false">http://www.vargatron.com/blogatron/?p=379</guid>
		<description><![CDATA[I&#8217;ve begun to dig into the Retrosheet database that I recently acquired (see previous post) and have started to plan out a basic format for parsing the immense amount of data into a readable, usable XML format that will let me retrieve game data in a simple, direct way.
In order to do this, I had [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve begun to dig into the Retrosheet database that I recently acquired (see previous post) and have started to plan out a basic format for parsing the immense amount of data into a readable, usable XML format that will let me retrieve game data in a simple, direct way.</p>
<p>In order to do this, I had to first think about how a baseball game is structured. What are the constants and what are the variables? How can I structure my XML around this?</p>
<p>Speaking in terms of a baseball game and not the way that Retrosheet is formatted, the constants in a game are as follows:</p>
<ul>
<li>9 Innings</li>
<li>2 halves of an inning</li>
<li>1 set of at bats and 1 set of fielding (defense) for each team</li>
<li>Away team always bats first in an inning</li>
<li>Each half of an inning consists of 3 outs</li>
<li>An out always ends an inning</li>
</ul>
<p>After writing this down, I was starting to see a structure.</p>
<p>I then started to investigate how the game data was stored in the retrosheet database. The database is structured in a somwhat simplistic way, with only two levels of heirarchy: Games and at bats. Within the at bats lies all the other information about where the at bat took place in the game&#8230;which inning, which team, how many outs, etc.</p>
<p>This format didn&#8217;t quite mix with the way that I was thinking a game should be represented, but for a database structure it made perfect sense. I then began to think how I could merge these two formats, and what information I could gain from a game.</p>
<p>After several attempts, this is the XML format that I have come up with which I think will work quite well in organizing the information that I need.</p>

<div class="wp_syntax"><div class="code"><pre class="xml" style="font-family:monospace;color: #FCFFBA;">&nbsp;</pre></div></div>

<p>This format organizes the game around the structure I mentioned previously, then adds in the game events from retrosheet as Plate Appearances within outs. This approach creates an XML document that is very easy to read, which I can use to reconstruct a game very quickly and pull connected data out based on related fields. At this point I am not pulling all the data but a smaller set that will be easy for me to get started with.</p>
<p>Here is a diagram of the overall structure.<br />
<a href="http://www.vargatron.com/blogatron/wp-content/uploads/2009/11/gameDiagram.png"><img class="alignnone size-medium wp-image-384" title="gameDiagram" src="http://www.vargatron.com/blogatron/wp-content/uploads/2009/11/gameDiagram-300x104.png" alt="gameDiagram" width="300" height="104" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.vargatron.com/blogatron/2009/11/25/digging-into-the-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting the baseball data.</title>
		<link>http://www.vargatron.com/blogatron/2009/11/21/getting-the-baseball-data/</link>
		<comments>http://www.vargatron.com/blogatron/2009/11/21/getting-the-baseball-data/#comments</comments>
		<pubDate>Sat, 21 Nov 2009 21:47:39 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Prototypes]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Thesis]]></category>
		<category><![CDATA[php]]></category>

		<guid isPermaLink="false">http://www.vargatron.com/blogatron/?p=377</guid>
		<description><![CDATA[The first major challenge of my thesis project has been acquiring historical data, parsing it into a usable format (database), and then going through it and trying to make some sense of it. I&#8217;ll try to go over some of the steps I took to get where I&#8217;m at (as usual both for anyone trying [...]]]></description>
			<content:encoded><![CDATA[<p>The first major challenge of my thesis project has been acquiring historical data, parsing it into a usable format (database), and then going through it and trying to make some sense of it. I&#8217;ll try to go over some of the steps I took to get where I&#8217;m at (as usual both for anyone trying to do the same thing and for my own sanity the next time I need to do this).</p>
<p>First of all, thanks to the extremely hard work of a few super dedicated individuals, historical data for every baseball game  is available online at the site http://www.retrosheet.com</p>
<p>These files have an insane amount of data for almost every game played for at least the last 50 years (there is more but before then the complete info is a bit hit or miss). Unfortunately, due to the way that they are recording the data (I&#8217;m not really sure how, but I assume it involves some sort of spreadsheet), the files are not really in an easily interpreted format. Luckily, again due to the work of a few dedicated individuals, the open source community has a solution for this as well. The tool &#8220;Chadwick&#8221; (named after the man who is historically credited with inventing the first widely accepted form of baseball scorekeeping) is a command-line unix tool which can take these files and parse them to Comma Separated Value (CSV) files. This does entail learning a proprietary command language for the compiling (it looks sorta like regular expressions but I don&#8217;t think its related), but the end result is worth it.</p>
<p>From this point (once you have your CSV files) you can create a database in the SQL of your choice (MySQL, sqLite, etc) and import the database.</p>
<p>Fortunately there are a few people out there who have already done a great bulk of this work, and have written some scripts to take care of a lot of this process for you. Unfortunately, they are using Python with the MySQLDB extension, which under snow leopard is near impossible to compile at this time.</p>
<p>The main site I have been working from to get this data to work can be found here:</p>
<p>http://blog.wellsoliver.com/2009/06/retrosheet/</p>
<p>This guy is the perfect combination for my thesis: a data geek and a huge baseball fan. He has made several amazing tools (a few of which I look forward to getting into in the future) that allow people to get baseball data in a bit more friendly manner, and shares his experiences obtaining the data.</p>
<p>I spent the better half of a weekend attempting to get his code to work on my computer, due in no way to any error on his fault but solely based on the fact that snow leopard refuses to compile and install a workable version of MySQLDB to python.</p>
<p>I finally gave up, and decided that I would go to plan B and run the script on my server (Dreamhost). This too proved to hit a few roadblocks (the version of Python installed on my server wasn&#8217;t new enough, had to install new Python then compile and install new MySQLDB a couple times <img src='http://www.vargatron.com/blogatron/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' />  )</p>
<p>Then I had to figure out how to compile the Chadwick source code under Unix, which seemed very unclear until I actually opened the readme file and realized all I had to do was type ./configure in the Chadwick directory and it would compile no sweat.</p>
<p>I finally got the code to run and (seemingly) work! Then I realized I had another error&#8230;my server was killing my requests and I was only getting part of the data. After emailing back and forth with Dreamhost for an hour or two, I finally realized what the issue was. The python code that I was working with ran in 20 seperate threads (which would have been fine had I been able to run it on my own computer) however my Dreamhost account had a limit on concurrent threads that were allowed to run and i was greatly exceeding this. After changing the amount to 5, I finally was able to execute the script and build my database. It took about 15 minutes (!) to run the script through, and I was left with a complete, 4+GB database of baseball information.</p>
<p>This has been a huge headache (to say the damn least!) but now that I have gone through this process once it should be a lot easier from her on out if I need to grab new data (hopefully&#8230;).</p>
<p>I&#8217;m now working on understanding and parsing this enormous amount of data (the one table of events alone has over 8 million rows&#8230;) and make some sense of it. I&#8217;m close to having a workable XML format and I&#8217;ll post more about this in the future.</p>
<p>One more thing&#8230;.</p>
<p>I found this link: http://www.wantlinux.net/2009/04/retrosheet-baseball-mysql-database-download/</p>
<p>which is basically the entire code already compiled into a database and downloadable <img src='http://www.vargatron.com/blogatron/wp-includes/images/smilies/icon_sad.gif' alt=':(' class='wp-smiley' />  On one hand this totally bums me out, but on the other hand it was a great experience having terminal/unix try to beat me down and coming out victorious! I am considering doing what this person has done and hosting the 2009 database on my site once it has become available, I&#8217;ll make a post if I decide to do so.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vargatron.com/blogatron/2009/11/21/getting-the-baseball-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My thesis project.</title>
		<link>http://www.vargatron.com/blogatron/2009/11/21/my-thesis-project/</link>
		<comments>http://www.vargatron.com/blogatron/2009/11/21/my-thesis-project/#comments</comments>
		<pubDate>Sat, 21 Nov 2009 21:12:48 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Thesis]]></category>
		<category><![CDATA[baseball]]></category>
		<category><![CDATA[music]]></category>

		<guid isPermaLink="false">http://www.vargatron.com/blogatron/?p=375</guid>
		<description><![CDATA[I&#8217;ve been doing a lot of thesis research for what seems like forever at this point. I&#8217;ve known from the start that I wanted to do something that I was really interested in both technically and conceptually (seems like an obvious thing but its very easy to get sidetracked by the thesis process&#8230;.). The easy [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been doing a lot of thesis research for what seems like forever at this point. I&#8217;ve known from the start that I wanted to do something that I was really interested in both technically and conceptually (seems like an obvious thing but its very easy to get sidetracked by the thesis process&#8230;.). The easy part of the process was identifying exactly what I wanted to explore, which is the subject of baseball. I have been interested in baseball since I was young, and to this day I am borderline obsessed with the sport (and the Phillies in particular <img src='http://www.vargatron.com/blogatron/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  ). I also knew I wanted to explore the emerging technology surrounding multitouch and tangible interfaces, so I thought that I could combine these two ideas and a concept would emerge. After a month or two of going back on forth on exactly what I wanted to do with baseball and multitouch (Data visualizations, augmented baseball cards, media center, etc) I finally had a revelation that I was trying to fit a concept into a box that I had voluntarily made. After freeing myself from the focus/necessity of designing exclusively for a multi-touch installation, which was severely limiting my thought process, I finally feel like I have come up with a concept that is</p>
<ol>
<li>Totally unique</li>
<li>Interesting to me</li>
<li>Can incorporate multitouch but does not rely on it</li>
<li>Can be appreciated by fans of baseball</li>
<li>Allows people who hate baseball to get an abstract form of enjoyment out of the game</li>
</ol>
<p>So what is my concept? My concept is to use existing historical (1950-present) baseball game data and investigate the rhythmic, cyclical nature of the game. Furthermore I want to show the factual, hard evidence that remains after a game has been played (box scores, statistics, etc) and from that attempt to recreate the emotional experience of a fan that existed at that moment in time.</p>
<p>My investigation will be materialized as an audio/visual system that will allow users to select a historical game and play it back in a condensed form. The events of the game will serve as individual beat tracks upon which audio samples can be assigned, either by individual event or entire tracks of similar events. The user will then be able to play this game back and see the action of the game in abstract form, both audibly and visually.</p>
<p>Some further ideas I have for this are splitting the game based on fan allegiance (home team, away team, neutral) and allowing the user to switch between these views during playbook. I&#8217;d also like the user to be able to physically &#8220;scratch&#8221; or scrub the playback, simulating a DJ scratching a record.</p>
<p>As far as the how/what of the actual project, I intend to develop the project in C++/openFrameworks which will allow me to develop across multiple platforms (Desktop, iPhone, Touch Table) while keeping the core code the same. I&#8217;m really excited to attempt to develop for multiple platforms, but at the same time this will add a large amount of complexity (and work!) to the project, so I need to get moving fast.</p>
<p>I still have a LOT of work to do, but I&#8217;m happy that I finally feel like I at least have a direction  and can start to develop prototypes and code non-stop. Hopefully I&#8217;ll have a first basic prototype up soon ( I&#8217;ve started already but I&#8217;ll put that in a separate post)</p>
<p>Anyone who actually reads this and have any feedback on the concept I&#8217;d love to hear it, please feel free to leave comments!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vargatron.com/blogatron/2009/11/21/my-thesis-project/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using PHP 5 from Dreamhost through Command Line</title>
		<link>http://www.vargatron.com/blogatron/2009/11/14/using-php-5-from-dreamhost-through-command-line/</link>
		<comments>http://www.vargatron.com/blogatron/2009/11/14/using-php-5-from-dreamhost-through-command-line/#comments</comments>
		<pubDate>Sat, 14 Nov 2009 18:45:32 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[php]]></category>

		<guid isPermaLink="false">http://www.vargatron.com/blogatron/?p=361</guid>
		<description><![CDATA[I recently ran into an issue that was a bit of a pain in the ass and I wanted to put it up here both to help anyone else out who runs into it and for my own reference.
I have been doing a lot of data mining/visualization recently which requires a large number of calls [...]]]></description>
			<content:encoded><![CDATA[<p>I recently ran into an issue that was a bit of a pain in the ass and I wanted to put it up here both to help anyone else out who runs into it and for my own reference.</p>
<p>I have been doing a lot of data mining/visualization recently which requires a large number of calls to several data apis. I am using PHP because I am familiar with it, I like it, and I can&#8217;t stand the structure of Python. One advantage of php is that you can execute it in the browser, which I normally take advantage of when I am working on API requests or data scraping. As my scripts have gotten more and more complex lately, I&#8217;ve quickly run into Internal Sever Error 500 messages which can only be fixed by limiting the amount of calls to make at a time. At first I thought this was just a limit of my server/php, but then I remembered how I recently was able to import a 4GB database into MySQL through the terminal with no problems after PhpMyAdmin was unable to handle anything near this. So I decided to try to run my script through the command line using SSH with dreamhost.</p>
<p>I attempted to execute the script, and then I got a bizarre error: &#8220;Fatal error: Call to undefined function:  mysqli_connect()&#8221; This error never came up in the browser, yet it was coming up in terminal. After some quick searching I found out this error was attributed to not having PHP 5 installed on my server. This is super strange because PHP 5 is on my server according to dreamhost. I then did a version check on php using the -v parameter. Sure enough, version 4.4.9. </p>
<p>So what was the solution? I found a couple resources after playing around with phrases to search on google, and figured out that while php 5 WAS installed on my server, the command line space is different than the web space, so I had to explicitly point to PHP 5 in order to use it.</p>
<p>Running the following command led me to the path of PHP5 &#8220;whereis php&#8221; which returns the following path &#8220;/usr/local/php5/bin/php&#8221;.</p>
<p>So super short finish to a long explanation, use &#8220;/usr/local/php5/bin/php&#8221; instead of &#8220;php&#8221; to execute php scripts using version 5 on Dreamhost.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vargatron.com/blogatron/2009/11/14/using-php-5-from-dreamhost-through-command-line/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
