Data Driven Vase

After trying to figure out a way to both capture twitter data based on location and via keyword, here’s what I’ve discovered:

1) Twitter’s advanced search allows you to search for tweets based on location, with a specific keyword, and between certain past dates. But, it only returns the “top” tweets. In other words, it only returns a very small subset of everything.

2) The ScraperWiki website can no longer scrape Twitter data.

3) The twitter4j library for Processing cannot filter live streaming tweets by both location and by keyword at the same time. Additionally, if one searches by keyword they get just a small sample of all tweets around the world w/ that keyword, and very rarely will a tweet in your desired geolocation be found.

So, best way to get the information I require was to capture all the tweets I can for my location, and then after capture can I process and search for my keywords.

Here’s my Processing code that will run non-stop collecting tweets. Each hour the script will save the current file, and then start a new text file to save tweets to so I’m not potentially dealing with a massive text document in the end.

import twitter4j.conf.*;
import twitter4j.internal.async.*;
import twitter4j.internal.logging.*;
import twitter4j.json.*;
import twitter4j.internal.util.*;
import twitter4j.auth.*;
import twitter4j.api.*;
import twitter4j.util.*;
import twitter4j.internal.http.*;
import twitter4j.*;
import twitter4j.internal.json.*;
import java.util.List;
import java.util.Map;
import java.util.*;

 Developed by: Michael Zick Doherty
 Adapted to capture twitter data, for a 
 specific geographical locaiton, by Alex Jacque.

PrintWriter output; // var to hold reference to text file
int textUpdateDelay = 60*60*1000; // every hour, (minutes * seconds * milliseconds)
int lastUpdate; // when we last updated
int hour = 0; // starting hour number

///////////////////////////// Config your setup here! ////////////////////////////

// This is where you enter your Oauth info
static String OAuthConsumerKey = "blFY1vgdJGR8LRqx6LTYpQ";
static String OAuthConsumerSecret = "P3SfVF8EbOouVS7kAQjXydLbfvjZbhLsuqMKYToUg";
// This is where you enter your Access Token info
static String AccessToken = "63581578-HGdSyHfp7kNppdFdMOM3E57Gm4zgiroPq419u4pZQ";
static String AccessTokenSecret = "f8OZ2dv74tXhWbx2r9X57ZunTDpyzzHAa7AMmQPsmo";

// if you enter keywords here it will filter, otherwise it will sample
String keywords[] = {""};

///////////////////////////// End Variable Config ////////////////////////////

TwitterStream twitter = new TwitterStreamFactory().getInstance();
PImage img;
boolean imageLoaded;

void setup() {
 double lat = 39.2905807; // baltimore, md
 double longitude = -76.6092606; // baltimore, md
 double lat1 = lat - .08;
 double longitude1 = longitude - .1;
 double lat2 = lat + .08;
 double longitude2 = longitude + .1;

 double[][] geoloc = {{longitude1, lat1}, {longitude2, lat2}}; // bounding box

 output = createWriter("hour"+hour+".txt");

 //if (keywords.length==0) twitter.sample();
 twitter.filter(new FilterQuery().locations(geoloc));

void draw() {
 if (imageLoaded) image(img, width/2, height/2);

// Initial connection
void connectTwitter() {
 twitter.setOAuthConsumer(OAuthConsumerKey, OAuthConsumerSecret);
 AccessToken accessToken = loadAccessToken();

// Loading up the access token
private static AccessToken loadAccessToken() {
 return new AccessToken(AccessToken, AccessTokenSecret);

// This listens for new tweet
StatusListener listener = new StatusListener() {
 public void onStatus(Status status) {
 if (millis()-lastUpdate>=textUpdateDelay) {
 output.flush(); // flush anything that's still not written
 output.close(); // close the output
 println("new file");
 hour = hour+1; // increment our hour integer
 output = createWriter("hour"+hour+".txt");
 lastUpdate = millis(); // reset last update
 String screenName;
 Date createdAt;
 String text;
 GeoLocation coords;
 screenName = status.getUser().getScreenName(); // tweet owner's screen name
 createdAt = status.getCreatedAt();
 text = status.getText();
 coords = status.getGeoLocation();
 output.flush(); // Writes the remaining data to the file

 // Checks for images posted using twitter API
 // println(status);
 // output.println(status);

 public void onDeletionNotice(StatusDeletionNotice statusDeletionNotice) {
 // System.out.println("Got a status deletion notice id:" + statusDeletionNotice.getStatusId());
 public void onTrackLimitationNotice(int numberOfLimitedStatuses) {
 // System.out.println("Got track limitation notice:" + numberOfLimitedStatuses);
 public void onScrubGeo(long userId, long upToStatusId) {
 // System.out.println("Got scrub_geo event userId:" + userId + " upToStatusId:" + upToStatusId);

 public void onException(Exception ex) {

void keyPressed() {
 if (key == 'k' || key == 'K') {
 // output.flush(); // Writes the remaining data to the file
 output.close(); // Finishes the file
 exit(); // Stops the program

The type of result I get from this looks like:

Sat Nov 22 16:00:07 EST 2014
If only you could wear your pjs to work LIFE WOULD BE AMAZING
GeoLocation{latitude=XX.XXXXXX, longitude=-YY.YYYYYY}
Sat Nov 22 16:00:08 EST 2014
getting my new phone on tuesday .
GeoLocation{latitude=XX.XXXXXX, longitude=-YY.YYYYYY}
Sat Nov 22 16:00:09 EST 2014
Brewer isn't a fit for tech make a change.
GeoLocation{latitude=XX.XXXXXX, longitude=-YY.YYYYYY}

Then, I run this through a PHP script to break the text file into an array of tweets and then further break each tweet into an array with indexes for screenName, date, text, and coordinates. From there, I can search for a specific term and have the number of matching tweets returned. But, I need to be careful in how I search, because if I just search for say “sad” it will return a match even if the actual word used in the tweet is “sadistic.”

 $result = array();
 $file = explode("**********", file_get_contents("hour0.txt"));
 foreach ( $file as $content ) {
   $result[] = array_filter(array_map("trim", explode("\n", $content)));
 $termCount = 0;
 $searchTerm = " love ";
 foreach ( $result as $tweet ) {
   if (strpos($tweet[3], $searchTerm) !== FALSE) {
 print $termCount;

Next step, run the script for 24 hours on my computer in my studio. Then, parametric modeling in Rhino/Grasshopper.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s