[ProgSoc] Predict Sydney traffic?

John Elliot jj5 at jj5.net
Wed Dec 15 23:32:32 EST 2010


On 15/12/2010 4:59 PM, traltixx at progsoc.org wrote:
>> On 11/12/2010 6:30 PM, John Elliot wrote:
>>
>> The best genetic algorithm is now placed 12th [1] out of 115 teams.
>> Slowly getting there...
>>
>> [1] http://kaggle.com/RTA?viewtype=leaderboard
>>
> I wouldn't mind donating CPU time but I don't seem to have visual studio
> installed.

Cool. It's amazing how CPU intensive it is. I completely re-wrote the 
genetic algorithm platform to use performant data structures (arrays 
instead of linked lists, and sealed classes rather than interfaces) and 
still it takes ages. I've been running it for a few days and have only 
done 38 generations (of ~150 strategies per generation).

> On the other hand, I have python and I was trying to convert it
> to python (while downloading C# and seeing which one would be done first).
> Anyway, good luck and I'll post the python code once I'm done.

Good luck with the python port, I'll be interested to check it out.

For now there is one particular problem I have that has me stumped, 
which is really annoying because I'm sure it's a trivial problem really. 
What I'm trying to do is build a model where if there is data within say 
the last n minutes (where n might be 60 or 120, or whatever) then the 
most recent reading will factor significantly in the results, whereas if 
the most recent reading is too far away (i.e. more than n minutes away) 
then it won't factor so significantly as a result. So if you're trying 
to predict 15 minutes into the future then the present reading is highly 
relevant, whereas if you're trying to predict 24 hours into the future 
then the current reading isn't likely so relevant as say a weekly 
average. Now the simple way to do this would be to do something like,

   if ( n > 120 minutes ) {

     last_reading_weight = 10000;

   }
   else {

     last_reading_weight = 0;

   }

But what I'd like instead (or "as well" I should say, because it's no 
trouble to trial each model) is a function (or maybe several functions 
could be trialed) that takes n and turns it into a weight in a more 
continuous fashion, where maybe I'd get readings like,

   f( 0 ) = 10000
   f( 15 ) = 9000
   ...
   f( 60 ) = 1000
   f( 120 ) = 500
   ...
   f( 1440 ) = 0.001

It would be ideal if the function f also took a random floating point 
value that modified the distribution while keeping the lower and upper 
bounds relatively in tact. Can anyone think of such a function?












More information about the Progsoc mailing list