Getting started with HTML5 and iOS

Category: Code, General, Javascript

I have been doing a lot of work for the past 8 months or so creating content/applications/etc for the iPad using HTML5, and I have ran into a lot of frustrating/annoying things along the way. I started not knowing Javascript, so there was that. Beyond that it was really hard to find iOS-specific information (or even webkit) when I started, so I began to compile a list of useful links when I found them. Here is the list so far, so that anyone who is starting out won’t have to make brutal mistakes or look for hours like I had to :) This is mostly a webkit-centric list, but Mozilla Developer Center is also a great place to check out. If anyone else knows good links feel free to suggest them and I’ll add them.


Dive into HTML5
Free web book with a great overview of all HTML5 features.

Appe HTML5 Showcase
Great showcase of HTML capabilities from apple. Source code is a bit proprietary and hard to understand.

W3 Schools
De-facto resource for all things web, their documentation of HTML5 standards. Usually useful although sometimes incorrect.

Surfin’ Safari

The blog for the Webkit project (HTML5 backend of regular and mobile versions of Chrome, Safari, etc.) Great examples of CSS3 features.
HTML5 expert Thomas Fuchs’s blog. Amazing amount of info. Updated frequently, and offers a number of HTML5 classes.

Font Squirrel @font-face Generator

Awesome @font-face generator that gives you the font files, CSS and examples for (almost) any font you throw at it.


Making an iPad HTML App Really Fast

Post on that walks through the basics of creating an HTML5 iPad app and optimizing it. Great overview.

Touching and Gesturing on the iPhone

Great overview of touch events.

Offline HTML5

Overview of creating offline apps, which is the basis for HTML5 web apps that work without internet connection.

Creating an HTML5 Web App

Shows how to create a web-app and a lot of grew things to consider. Aimed at iPhone but translates pretty much exactly to iPad.

iPad + HTML5 + Javascript Memory Management

Article I wrote as an overview of how memory management works on the iPad and how to deal with it.


Framework for dealing with SVG based graphics.


HTML5 Canvas based port of the extremely popular Processing language. Great for creating interactive canvas based applications.


The leading Javascript general library out there. Good for basic effects in websites, but not so much for web-based apps.


JQuery-based mobile framework, great for create simple UI based apps, still a bit under development. Not so great for custom apps


Probably the best all-inclusive solution for creating cross-platform mobile apps. Based on the Ext.js framework, Sencha uses an extremely different system for creating apps. Similar to jQtouch (although more complete/powerful) in that it is easy to make simple UI based applications, but not great for custom apps.

MooTools (My Object Oriented Tools)

My favorite (and in my opinion the best) general framework for creating custom HTML5 apps. Supports full object-oriented programming (classes, constructors, extending and implementing classes, etc). Not as large of a user base as jQuery, but a lot more serious/experienced users. Great comparison between MooTools and jQuery (by the creatore of MooTools) at

Continue reading » No comments

Pennant HATED on :)

Category: General

I just came across this extremely detailed post about why Pennant is terrible.

Pennant App: Disappointing Data Design

First of all I’d like to say that while I am proud of Pennant, it is in no way complete and I’ve had an enormous amount of extremely useful constructive criticism from users. I have responded to each user thanking them and have a long list of improvements planned to make the app even better. This kind of back and forth between developers and users that the app store provides is amazing and I can’t wait to start improving things.

That being said, the post above is just ridiculous to me. I have tried posting a response but the comment form does not seem to work . I thought I would respond here since I have my own forum and can post the full argument. My comments work so feel free to respond. I am completely open to a civil discussion/debate.

My Response:
I think the entire context of this article is unfair and incorrect. You are not considering the audience and purpose of this app, but are looking at it from an ivory tower full of tufte books. This application is for entertainment. This is not made for baseball statisticians. If it were it would look extremely different. The entire point of this application is provide an entertaining look at baseball in a way that a normal person would enjoy navigating.

That being said I also don’t agree with many of the points you made about the visualization.
Several points:

Perhaps you should mention that the second button allows you to view all teams in a grid, completely avoiding the coverflow.

Subsequent views provide information about the season (wins/losses and percentages for seasons, scores for games).

Also, perhaps you aren’t a baseball fan? The point you make about the colors being arbitrary is beyond absurd, these are the colors of each team which mean a lot to fans. A jersey? That would be great if MLB didn’t have complete licensing over that. I am an independent developer trying to get my project off the ground, if I could afford things like that I would be happy to add them to the app.

Map View:
Again, how is this useless given the audience? Did you know that most non-baseball fans assume that there is a team for every state in the continental US? Is it not interesting/informative to see the distribution of teams throughout the US? Your point about the angels/multiple teams is valid and something I’m looking to improve upon/fix.

Radar View:
Again, you fail to mention that if you hit the third button on the bottom you can SEE IT IN BAR CHARTS with the totals clearly printed. This is extremely unfair to exclude this view. The center visualizations are experimental, and the last views are more traditional and statistical. You are showing what you don’t like and acting like this is the only option. Maybe you haven’t pressed the third button? Otherwise this is pretty ridiculous to exclude.

Bubble View:
I agree with you that this is somewhat pointless now, it was an experiment and I think it needs more information. Also, again, if you press the third button you can see all of the rankings in a normal format. Several users have indicated that they would love to see the actual records of each club here and I plan on adding this information as well.

Bottom Lines:
Your review of this section (except for the game wins/losses) is pretty ridiculous. The entire design of this means that you don’t have to look at the bar graph as you move your finger (which I’m sorry isn’t hitting 3 things at once because it isn’t possible the way the code works), meaning that you move your finger along the bottom and watch the top indicate what season/game you are on. Totals for the game view are provided in the previous view of seasons, however I will say that it might be nice to have them here.

Game Vis:
The view is circular because it allows a full view of the game as a whole, regardless of inning, but also allows the user to visually go around the circle in a time-based manner. What is the difference between a homerun in the 9th that wins 1-0 or a homerun in the 1st that wins 1-0. If you just want to see which team performed better you can look overall and get a general sense of who has longer lines. If you want to follow play-by-play you can go around the circle with your eyes and see how it unfolded. Putting the game in a linear, horizontal layout (which I am doing in the third view, although you again ignore this…) limits the viewer to a left-to-right time based view due to the length. It is much harder to view the game as a whole when it goes across the screen.

How is it weirder when you play the game back? If anything you should like this view since the game is laid out horizontally. If you think this view is weird you are clearly not a baseball fan or the audience for this app. IMO this is the most exciting part of the app, being able to replay a game from history and relive it.

Lastly, your Tufte example drives my point home. This is great. It is very informative. If I need data, I’m going straight for this style graph. Does the average fan want to look at this? Is it a pleasure to use? Do I feel like I’m enjoying myself like I do when looking at baseball cards or watching a game? Nope. Feels like work honestly. There are a ton of great usable stat-based resources for baseball. This is meant to entertain the average fan and maybe get them into stats so that they start looking for stuff like the tufte chart.

If you want Tufte-esque down to the details tools for baseball stats, check out everything they are doing at Bloomberg Sports. They are making amazing things that are meant as tools for hardcore stats/fantasy people.

Continue reading » 6 Comments

Pennant Released!

Category: Uncategorized

I’m EXTREMELY excited to announce that my iPad application, “Pennant”, is now available!

Its taken a long time to make this happen (almost a year) but I couldn’t be happier with how it turned out.

Check it out of you get a chance at

Continue reading » 3 Comments

What the hell have I been up to?

Category: Uncategorized

So as usual, this post is another in a long line of “haven’t been updating” posts, but hey, at least I’m trying! Getting Vargatron Up and Running Since I’ve graduated from Parsons I’ve been hustling non-stop to get myself up and going as a successful independent designer + developer, and things are slowly but surely coming along. Being independent has opened up a whole line of pain-in-the-ass problems/tasks/etc, but now that everything is going and I have a handle on things it really has been a great and rewarding way to work. Here are a few of the things I had to do and few I continuously worry about:

  • Setting up business as LLC for taxes, legal reasons and legitness. No one is gonna take you seriously if you give them your SSN for payment, having a business tax number just makes things way more legit. Plus it makes it a lot easier to write off the pain in the ass expenses that come with running a business
  • Office Space Working from home was great when I started. For about 3 weeks. Then it got really brutal really fast. Waking up and walking into the area you have to work in is no fun when you live in a smaller sized apartment (anywhere in NYC area unless you’re rich). I found a few interesting co-working spaces in Manhattan but they all were a bit pricey ($500 a month), but then I hit the jackpot with a REALLY nice small shared office space in Dumbo, right under the Brooklyn bridge. Its great to have such a comfortable place to go and work every day, and I’m right around a large number of creative businesses and people that I know
  • Billing My previous billing situation was ok, I had some basic software I’d use but for the most part my jobs were not really hourly and very informal. Now that I have proper clients and I’m billing hourly I’ve stepped up my game significantly, using the software “Billings”, which is awesome. You can google it and check it out for yourself but its been extremely helpful as far as custom invoices, payment tracking, receipts, etc.
  • Project Management This is an area that I’m still a bit behind on (really gotta get my GIT game up and running), but just keeping track of clients, projects and versions requires a really large amount of organization. In order to do this I have both a local and remote server that are mirrored at all times to serve latest versions to clients and keep them up to date.
  • Back That HD Up! My biggest fear since I started working on deadlines and projects has been losing work. To remedy this and let me sleep at night I purchased a 1TB internal HD for my laptop and a 1TB external for Time Machine. Every night I plug in the HD and I’m instantly backed up. I also have a remote backup that I use for more valuable files that I periodically back up to using a custom RSync script I wrote

What I’m Working On Since starting Vargatron and working full time independently I’ve been lucky enough to land a number of great jobs while continuing to work on my own projects as well. In addition to that I’ve been teaching at Parsons The New School for Design in the MFA DT Program, which has been amazingly fun and rewarding.

  • VTRON X ESPN The first, biggest, and most involved client that I have is ESPN. I have been working on mobile web applications for the ESPN the Magazine iPad application since September, and its been awesome. The first couple of apps were more for testing out the possibilities of the platform and getting integrated into the magazine’s digital workflow, but now that things have been sorted out some great projects are beginning to emerge. I will be posting a few of the best things in my portfolio in the near future, and stay tuned for continuous cool work coming out of this partnership.
  • SYPartners I have also had the honor of working with SYPartners, an extremely unique company that is using emerging technology to push fortune 500 companies into the 21st century and beyond. The great thing about my relationship with SYPartners is that they are not only willing to experiment with new technology, they encourage it. Our projects thus far have involved significant research and experimenting, and it has been a blast. I have been able to learn new platforms and languages (and get compensated for it) and then apply this to real world projects. Our biggest project to date is an HTML5 application that runs as a native iPad app, a web-based iPad app and a desktop app, all with the same custom code base. This project allowed me to get up to speed with HTML5 and push the iPad platform to see just what could be done with it.  This will be shown in my portfolio shortly as well (gotta cut some videos…interaction isn’t the same without video!)
  • Pennant My senior thesis project, “Pennant”, has been an ongoing learning experience as I have tried to bring it to market. After looking into several partnerships for its release I have come back to releasing it on my own, which I now realize is what I should have done from the beginning. I have rewritten the entire app to perform better and also invested in an inexpensive content delivery network for data delivery that should ensure that the app preforms how it should. It is currently under review for release by Apple, and I have a full web site showcasing the app set up at Hopefully more good news coming from this soon.
  • Other Random Projects In addition to these ongoing projects, I’ve also taken on various other short-term projects:
    • Parsons OSI Website I FINALLY completed the Parsons OSI website, in collaboration with the amazing Bruce Drummond as well as Aaron Druck and Andrea Bradshaw. The site looks really great even though it took a lifetime to get done. Check it out at
    • Jigazo iPhone App I’ve been working in collaboration with Zach Lieberman (google for a brief list of his awesomeness) on the interface development for the iPhone version of Jigazo, a digital puzzle toy for the Japanese and US markets. My role has been pretty limited creatively but I’ve learned a good amount about structuring projects from Zach and overall its been a pretty cool experience. Nots sure when this is releasing or if it will be released in the US but will probably make a note of it on the site when it happens
    • Hip Hop Word Count I’ve been talking off and on with Tahir Hemphill about collaborating on some data vis using his Hip Hop Word Count database that he’s been creating during his residency at Eyebeam. Its been a bit off and on with our schedules not matching up, mostly due to the pain in the ass business stuff I listed above, but now that I’m settled down we’ve started to get to work and I hope some awesome stuf will come out of this soon!
  • Teaching
    • I recently finished teaching the introductory course “Creativity and Computation” at Parsons, and it was awesome watching people go from knowing very little to being well on their way to being awesome coders. You can check out the work made at the classes website,
    • I am now teaching a course that I am super excited about, Data Visualization. You can follow along with the class at this site:

So I think that is everything I’ve been up to thus far. Its been a wild ride and its crazy to think its really only been about 6 months of independent work so far. I was a bit nervous at first to go in this direction but now I’m getting really excited and can’t wait to see what the future brings.

Full disclosure: The main reason I am writing this is because people I see at parties or dinners or whatever that I know are always asking what I’m up to, and I basically try to list all this stuff and I can tell about 3 sentences in they don’t really care and it is awkward and I just sorta stop talking. Its a total Larry David sort of situation. So from now on I’m just gonna say “Computers and Shit”, refer them to this post, and grab another beer :) If they really are interested they can check this post out, otherwise I won’t be filling the world with any more boring conversation and we’ll all be happy.

Till next time,


Continue reading » 3 Comments

iPad + HTML5 + Javascript Memory Management

Category: Uncategorized

I am currently working on a very large HTML5 project on the iPad, and so far I’ve really enjoyed the experience. I plan on doing a longer blogpost containing all the things that I have learned (and maybe some code to share :) ) but for now I would like to share with everyone a BIG problem I have encountered and the solution that I have found.

The application I am working on contains a large amount of graphical elements, at one point displaying upwards of 24 1024×768 images that can be manipulated via Javascript touch events. At first this didn’t seem possible, but after discovering and using the webkit-transform properties (which enable hardware acceleration) it has been pretty smooth sailing. Recently the application has grown in size to the point that I have been hitting the upper limits of the iPads available memory for Mobile Safari. This results in the not so graceful error of a hard crash by the browser.

If you want to learn more about the issue and my research, read below, if not just skip to the solution section and ignore the rest.

Coming from a background of C++/Objective-C development and having done some native apps in the past, I knew that this usually results from an application running out of memory. Safari may be a browser, but when it comes down to it there is little difference between Safari and any other application in the app store. Normally I would be very conscious of my memory usage and make sure to free any assets that weren’t currently being used.

Memory management of this type is not possible in Safari. The browser caches everything that is loaded into memory and attempts to maintain that cache while the app is open. It handles this in a very smart way (only loading things as they enter the display space) so when the app loads as long as I have all divs that aren’t being displayed set to display:none the browser ignores any assets. As the user starts to interact with the application, they move through the interface and reveal images. These images are loaded into memory, and remain there. This basically results in a pile of trash (garbage :) ) that begins to pile up. In the case of my application, once a user has gotten through the majority of the app I am pushing upwards of 50 1024×768 images loaded in memory! If a user goes back into a section there is no load time, the images are already there. This is great for a normal website where the user leaves pages constantly and the local memory is freed. A web app on the other hand is essentially 1 web page. In the case of my application this one page consists of thousands of lines of code and can potentially load up to 5-6 mb of data. Worse, it isn’t even done yet and will probably bloat more very soon.

After a good amount of research, I found out that the memory limit for an iPad web app is about 5mb. After discovering this earlier on I applied very aggressive compression to all images and managed to get the app stable again. Now that I have started adding more assets, this will no longer do the trick.

There is no way in javascript to “free” an image per-say, regardless of whether it is in the html (lazy-loaded) or creating dynamically through the DOM using javascript. You CAN however, trigger events that will cause the browser to unload an image from memory. The easiest (and I think best) way to do this is to replace the SRC of every image you want to hide with 1 tiny image. In my case I am using a 1px solid white png that is miniscule in size. This causes the browser to reload the image, at which time it boots out the cached, “real” image out of memory and replaces it with this tiny image. If you are hiding and showing your divs (which you should be if you have something this big) the only repercussion that this will have is that you will perpetually see your images loading every time you show a div. In most cases this probably isn’t an issue, but if it is you can selectively choose to not use this method in certain areas. The main point is that you have the CHOICE to do this which is great.

One immediate way that I can see this being extremely useful is for any application that loads in an endless amount of images (infinite carousel or something), for example a movie library browsing app. Instead of deleting old images (which keeps them cached) and adding new ones, simply changing the src tag will ensure that you never have more images loaded than are being displayed.

This may be a big “no shit” thing to some javascript devs out there, but I had a tough time finding any real documentation on it, so hopefully it will be useful to others trying to do mobile web dev.

Here is a very simple class I wrote that shows what I am talking about. You create a new instance and feed it the element you want to show/hide and the URL of a placeholder image, and it does the simple work for you.

/*Vtron Image Manager*/
/*Copyright 2010 Stephen Varga, Vargatron*/

function VtronImageManager(element,placeholderURL) {
	this.element = element;
	this.placeholder = placeholderURL;
	this.images = this.element.getElementsByTagName('img');
	this.imageURLs = this.getImageURLs();

VtronImageManager.prototype = {
	show: function() {
		for(var i=0; i < this.images.length; i++) {
			this.images[i].src	= this.imageURLs[i];
	hide: function() {
		for(var i=0; i < this.images.length; i++) {
			this.images[i].src	= this.placeholderURL;
	getImageURLs: function() {
		var imageURLs = new Array();
		for(var i=0;i < this.images.length;i++) {
		return imageURLs;

Continue reading » 10 Comments

Processing.js is unreal!

Category: Uncategorized

Recently I’ve come across some work that is requiring me to explore the HTML5 canvas across both mobile devices (namely iPad) as well as the desktop. I have looked into HTML5 before, and while it is interesting, I’m really not crazy into web development so I never really got too far into it. Basic HTML5/webkit/etc combined with CSS3 seems to be a huge step in the right direction as far as simple, good looking basic interactions and animations go, but as far as full interactive apps or data visualizations go its really not that great of an environment. Since these are the areas where my interests lie i have been keeping an eye on things but haven’t really dove in yet.

Enter the HTML5 canvas element and the Processing.js project, which is basically a full port of the processing environment to Javascript/HTML5 canvas. I noticed this project about 6 months ago, and I didn’t really get the point of it. It seemed interesting to me but was very limited in use and I had doubts that it would ever be legit enough to use for a full project. I was totally wrong. The developers working on this project have been killing it, and as of today the newest release supports a crazy amount of Processing code that just works!

I have never been a giant fan of Processing, although I think its a great learning environment. Processing has always been too slow/restrictive on the web and not as fast as C++ on the desktop for me to really consider it for any major projects. Now that Processing.js has been created, the web performance/integration issues are rapidly fading away. I now consider processing to be an extremely useful tool for creating web based interactivity, and I’m really excited to dig in further. Expect some updated posts with some experiments from me in the near future, but for check out these links that I have found and feel free to ask any questions.

Main Site:

Creating iPad/iPhone apps that Cache (offline web apps):

Awesome iPad based code editor (code applications on the iPad…so great for messing around!):

Controlling Processing.js/HTML5 Canvas using HTML Elements:

Basic Data Vis/Presentation Layer:

Continue reading » 3 Comments

New Site is UP!

Category: General


That took way too long. Finally got a few spare moments to get my new website up and running. Hopefully will be pushing out updates and blog posts regularly now since I am no longer ashamed of my own site. Funny how hard it is to do something for yourself, I guess that is a good sign though since I have been so busy.

Many updates coming soon, including a full post (and accompanying website) for Pennant, my MFA thesis project and the current main focus of my efforts.

Several other fun projects coming soon as well!

For now, check this out:

I was invited to work with Zach Lieberman,Takayuki Ito and Lucas Werthein as part of a team in the Techcrunch hackathon competition. It was an AWESOME event . Great food, great people and a lot of fun. Crazy nerd projects all over the place, really inspiring to see some of the projects that came out of it.

Our hack “Future Mario” won an award and we will be presenting it at the Techcrunch Disrupt event this coming week. Basically we are controlling Super Mario Bros using vocal pitch detection, eye tracking, head tracking and blink detection. Its a bit of a rough hack but its tons of fun and has a lot of potential.

Thats it for now its late and I’m spent, but I can finally rest knowing I can tell people my URL again :)

Continue reading » No comments

Workin on some Vis!

Category: Prototypes, Thesis

‘ve been working all weekend on putting down a solid visualization of baseball games that gives an informative overview of exactly what happened in a game in the least amount of space available. I want to give a variety of ways to view a game in order to get the most out of it, but first and foremost I want to provide a simplified overview of all game events that shows the flow of the game and allows people to see exactly what happened.

I started by trying to classify events and think of how that I can rank them each in a way that will allow me to show them in a linear way that both unifies them per team as well as shows the magnitude of each event. In this idea I thought about the major possible events and ranked them according to the impact they have on the game, form the perspective of the batting team.

Tier 5: Homeruns,
Tier 4: Triples
Tier 3: Doubles
Tier 2: Singles
Tier 1: Walks/Stolen Bases
Tier 0: Outs

I had to revise this several times after experimenting with it, mostly due to issues wtih classifying walks/stolen bases and singles. At first I thought that I could get away with putting them in the same tier, since they all result in the player moving up one base. I then realized that there is a clear difference between the two. While walks and stolen bases both advance the runner towards scoring position on the diamond, singles put the ball in play, which can result in RBIs. Walks can sometimes result in an RBI if the bases are loaded but even this is restricted to one run, while a single can drive in up to three runs.

After coming up with this classification, I began to work on a circular visualization of events that classified them according to a number of criteria. While in concept this made sense to me, in reality it turned out to look somewhat interesting but really show a lot of nothing with a simple look. I also attempted to create an organic shape from this visualization just to see what it looked like, and while this too proved interesting I dont’ think it was very successful in showing anything.



I then moved on to the next logical step, visualizing things linearly. I continued with my key of dashes and colors, and while linearly things made a bit more sense it was still not working very well IMO. I experimented with representing outs as negative events at this point as well but I don’t think it read the way I intened.



At this point I also created a more organic flow diagram from these charts. I think this actually worked out really well and is a great overall representation of the flow of the game. While it doesn’t really show exactly what I wanted from the first overall view of the game a user is presented with, I think it is a great sub-view and something that I’m going to continue to investigate


This is where I realized I was sorta getting caught up in doing the same thing over and over, so I decided to give it a rest and move on the next day.

Today I started working again and had some major breakthroughs. I spent a lot of time working on various visualizations, trying to bring down the amount of data ink used (as per reading Edward Tufte) and began to really simplify everything. I did a lot of different experimenting, and came up with a visualization I think finally works pretty well. I did some quick testing with a few casual baseball fans I know (obviously need to do more user testing but something is better than nothing and I just finished it!) and I was pleased that they could easily recreate the narrative of a game themselves after a very brief explanation which wasn’t repeated.





All of this needs more work I know, but I put in a crapload of time this weekend and I think that I am definitely making progress. I also worked with a real data set vs just making things up, which was a bit of a pain in the ass at first but really helped me a lot in the long run. I hsould have been working like this before and I’m glad I am doing it now.

Thats it for now, I also did a good amount of work on the otehr sections of the application but I don’t feel like they are ready to show yet so that will be another post.

If you read this please check everything out and give me some feedback!



Continue reading » No comments

Getting MySQLdb up and running on Snow Leopard with MAMP

Category: Code, Prototypes, Python, Thesis

OK this is another post that is more for my own future sanity than anything else. This has been a huge pain in the ass for me many times and I want to document the steps I took to get this working.

First, this is a good general overview

So I downloaded/installed MySQL, then I had to add it to my PATH var to get the MySQLdb install to work.

//Open With textmate
mate ~/.profile

export PATH=”/usr/local/bin:/usr/local/sbin:/usr/local/mysql/bin:$PATH”

then make sure you update your profile or close and open a new terminal window
source ~/.profile

that will let the MySQLdb install work. Then when you run Python and try to use MySQL db you’ll get errors. What you need to do is the following:

change your PATH to export PATH=/Applications/MAMP/Library/bin:$PATH

which links to MAMP mysql, then run this command

sudo ln -s /Applications/MAMP/tmp/mysql/mysql.sock /tmp/mysql.sock

Then you should be good to go. Might be able to skip the first step but I don’t know if the MySQL that comes with mamp is 64bit and you need 64 bit for snow leopards install of Python..

Why is python such a pain in the ass?

Continue reading » 4 Comments

Considering baseball data API

Category: Context and Domains (Research), Thesis

So I have been doing a lot of research about the data I’m going to need for thesis, and where I should get this. The two main sources of data I am looking at are Retrosheet, which I’ve previously mentioned extensively, and the unofficial MLB xml sources which are used by the official MLB gameday app.

Both of these resources have their advantages/disadvantages, and neither completely suits my needs. I am starting to think that in order to get what I want, I will have to create a hybrid of the two, but the big question is how exactly will I do this? Here are my thoughts on the good and bad about each source:

Great resource, a pain to get into a database but once its there its very compete
Data going back to 1951
Self contained, doesn’t rely on resources that may be unavailable in the future
Doesn’t contain any current season data
No great way to get data, have to write custom XML to get data in a usable form

MLB Data:
Completely thorough
Provided in XML format that is really well executed
Updated daily for current games
Not official, no promises that it will be there in the future
Questionable as far as a data source/retrieving data repeatedly.

I have been working on creating an API for Retrosheet games that returns complete data, and I’ve made some headway but its largely a work in progress. After finding the XML based MLB data I can see that they have already tackled this and have the data in a beautiful format that is pretty much exactly what I wanted to make. They do not however have this data stored in a database or any easy way to request a game in a traditional way.

I think that my strategy for this problem will be to combine both resources into a best of both world’s situation that I can make freely available to anyone. I will adapt the MLB XML format and parse all Retrosheet games to this, taking advantage of the format already established by as a standard and saving myself some work. I will then write code that does the opposite with the data, checking every day of a season and breaking down their XML into data that can be put into the retrosheet database. This will likely be the hardest part of the process but I think I can figure it out. This will provide me with a complete retrosheet database with current game data and an XML format that will provide a full summary of a game in an easy to read way.

Hopefully this works out for me, if I can do this and provide access to it through my web server I think it will be exciting and a great contribution to the baseball data community.

Continue reading » 5 Comments