Skip to content

2012.03.15 Weekly Check In

demory edited this page Mar 29, 2012 · 1 revision
  • 13:32 <demory> Hey folks, ready to get started w/ the check-in?
  • 13:32 <mattwigway> Yep.
  • 13:32 <novalis_dt> Sure.
  • 13:32 <kpw> it might be nice to have a both eventually (I can see apps that would make use of both) but in the near-term let's focus replicating basic timetable functionality
  • 13:32 <andrewbyrd> Hi, sure.
  • 13:33 <demory> ok, let's get started then
  • 13:33 <novalis_dt> I've been working on misc issues all over the codebase. I'm waiting to hear back from Brian Ferris on the DST issue; if I don't hear from him by Monday I will pester him by email.
  • 13:34 <kpw> novalis_dt: brian's been away at a conf this week so he's probably got an email backlog
  • 13:34 <novalis_dt> Yeah, I'm willing to give him time - - the next DST issue won't be until november
  • 13:34 <demory> My week has been split between system mapping and OTPsetup work, but focusing on OTPsetup now. Have upgraded to 0.5, though there are some issues to be worked out esp. w/ larger deployments. Currently working on setting up a parallel "test" version of the workflow on AWS that will allows us to test w/o disturbing the live application
  • 13:34 <novalis_dt> Well, hopefully it will be never -- I wrote to my senators to ask them to abolish DST, and maybe this time they'll listen.
  • 13:35 <novalis_dt> Also, I'll be out Tuesday next week for some minor surgery. I hope to be back at work on Weds, but that depends on how I feel; I won't be around for the meeting regardless because I have a follow-up appointment at that time.
  • 13:36 <demory> ok, hope it goes well!
  • 13:36 <novalis_dt> Thanks.
  • 13:38 <andrewbyrd> I think the agency issues are fixed now. Those changes will be in the release, so they should be available in OTPSetup for the NYC demo.
  • 13:38 <kpw> great! so that's all related to agency ids?
  • 13:38 <demory> excellent. when are you planning to do the release?
  • 13:39 <novalis_dt> andrewbyrd, so, the upshot is that now if a feed wants to share stops with another feed, it must either have the same agency id in agency.txt, or set defaultAgencyId?
  • 13:39 <andrewbyrd> I have also been experimenting with different ways of pulling requests out of http query strings and paths. Just using the java servlet SPI, no Jersey annotations.
  • 13:39 <andrewbyrd> And combining that with a lightweight dependency injection framework instead of spring. I'm trying it out on analyst first to see how it goes.
  • 13:40 <FrankP> Sneak my update in (I'll only be on the chat til 11'am) -- I fixed the UI bug (show / hide) from last week's chat...also added agency to the itinerary output. Was going to add maxTransfers, but need to change the API to accept null values, and use it's default value.
  • 13:40 <andrewbyrd> novalis_dt : yes
  • 13:40 <novalis_dt> andrewbyrd, can you make a note about that on the GraphBuilder wiki page?
  • 13:41 <mattwigway> With help from novalis, I coded up a quick annotation search tool in VizGui.
  • 13:42 <novalis_dt> kpw, actually, one more question for 636(b) -- is this mainly intended for consumer facing things? Because there's actually a lot of complexity around service ids and stops that don't allow pickups and various other things that we could avoid by simply specifying a single date (or date/time) to do the search on.
  • 13:42 <kpw> mattwigway: that's awesome, thanks!
  • 13:42 <andrewbyrd> OK, will add a note about agency issues on the wiki.
  • 13:43 <kpw> novalis_dt: let's specify a date. that's important for a lot of reasons
  • 13:43 <novalis_dt> OK.
  • 13:43 <andrewbyrd> kpw: I think agencies are fine for now, and if there are any other problems lurking this new york graph should reveal them.
  • 13:43 <kpw> (default to today but specify another if desired, perhaps?)
  • 13:44 <novalis_dt> kpw, sure.
  • 13:45 <novalis_dt> FrankP, did you manage to figure out why you were getting different results on the stop linking than I was? Can I see the config you're using for the instance built with MapBuilder?
  • 13:45 <FrankP> novalis_dt ... beyond date, it might be interesting to have relative dates, e.g., next Saturday, next Sunday
  • 13:46 <mattwigway> As to the first trip/last trip discussion, SF 511 (transit.511.org) has this functionality.
  • 13:46 <novalis_dt> FrankP, Well, this is for the API
  • 13:46 <novalis_dt> FrankP, so handling that on the client side should be fine.
  • 13:46 <mattwigway> FrankP: BART also has generic Weekday, Sat and Sun schedules.
  • 13:47 <novalis_dt> (you'll notice that this is my answer to everything)
  • 13:47 <FrankP> client...
  • 13:47 <kpw> FrankP, novalis_dt, mattwigway:that's a common scenario but it gets tricky fast (esp. with holidays). i agree about not baking that into the api just yet
  • 13:48 <novalis_dt> It's not that I'm lazy. It's that I really prefer minimal APIs with layering where necessary.
  • 13:48 <FrankP> The graph yesterday didn't have mapbuilder I guess...looked today at the config, and my edits were gone. New purl=/osm graph has map builder.
  • 13:49 <FrankP> novalis_dt -- re: FrankP, did you manage to figure out why you were getting different results on the stop linking than I was? ^^
  • 13:49 <novalis_dt> FrankP, cool. So does 6668 look good?
  • 13:49 <novalis_dt> And interns, when you get the chance, can you check out the linking in /osm to see if these changes made any difference?
  • 13:50 <FrankP> Haven't found a trip that looks different (old version looks okay), but 7964 does look better.
  • 13:50 <FrankP> compare: http://maps5.trimet.org/otp/?purl=/osm&submit&fromPlace=SE 122nd @ Rhone::45.495980,-122.537800&toPlace=SE 82nd @ Foster::45.483280,-122.578780
  • 13:50 <FrankP> to
  • 13:51 <FrankP> http://maps5.trimet.org/otp/?purl=/test&submit&fromPlace=SE 122nd @ Rhone::45.495980,-122.537800&toPlace=SE 82nd @ Foster::45.483280,-122.578780
  • 13:51 <grant_h> just tested 6668 and it snapping to the transit route now
  • 13:51 <novalis_dt> Excellent.
  • 13:53 <mele> will check more later in the day
  • 13:53 <FrankP> 6668 is origin in both new & old urls above, and it does the same thing ... destination is 7964, and it does better in the 1st (map builder) url
  • 13:54 <FrankP> Are there any worries, downsides to using mapsBuilder? Should I go with that for our production graph?
  • 13:54 <mattwigway> Does the new code try to snap to routes of the same agency as the stop? I've got a place where there's a light rail platform adjacent to a heavy rail, and the light rail stop is snapped to the heavy rail platform (this is two different agencies).
  • 13:54 <novalis_dt> There are some worries about mapbuilder -- it's very undertested.
  • 13:55 <novalis_dt> But other than that it is unlikely to break anything
  • 13:55 <novalis_dt> mattwigway, actually, it's more specific than that
  • 13:55 <novalis_dt> mattwigway, and less
  • 13:55 <novalis_dt> mattwigway, for rail stops, it prefers to snap to any platform over a non-platform
  • 13:56 <novalis_dt> mattwigway, for bus stops, it tries to snap to streets which are used by routes which serve that specific stop
  • 13:57 <kpw> novalis_dt, demory: we fixed the issue with duplicate segments, correct?
  • 13:57 <novalis_dt> kpw, yeah
  • 13:57 <novalis_dt> I guess we could have platforms in OSM specify which stopid they're for
  • 13:57 <mattwigway> I've got an issue with two parallel platforms, one for light rail and one for heavy rail. I guess there's no way to fix that though.
  • 13:57 <novalis_dt> Well, if the stops were precisely located, that could work
  • 13:58 <FrankP> Orthogonal to this, do we have any good automated test recommendations (beyond the default jUnit) to capture and compare API / trip output. We'd like to build a set of tests that run (automated) before we release a new graph. Currently, a lot of manual work here, and I'd like to automated it to run with new GTFS.zip productions...
  • 13:58 <FrankP> (BTW, have to run but will have window open...chat later)
  • 13:59 <novalis_dt> I don't off the top of my head know of one.
  • 13:59 <kpw> FrankP, we've talked about getting batch trip setting for benchmarking and graph/algo changes. folks, how are we on that front?
  • 13:59 <mattwigway> I had a quick Python script that I used to do batch trips everywhere to everywhere from a PostGIS table... let me find it.
  • 14:00 <kpw> can we add that to the CI work you've been doing andrewbyrd?
  • 14:00 <mele> us too, talk to you guys later
  • 14:00 <grant_h> talk to you next week
  • 14:00 <mattwigway> https://gist.github.com/1542816
  • 14:00 <andrewbyrd> yes, we really need better integration testing and that sort of code could be reused for trying out new graphs
  • 14:00 <novalis_dt> later, mele.
  • 14:00 <kpw> when we discussed this last i think there were questions about the data, but i thought we could use a reference data set
  • 14:01 <kpw> given gtfs/otp and query set (could be historical)
  • 14:01 <andrewbyrd> I have a little script that loads a bunch of origins and destinations categorized by center, suburbs, outlying and tries all the permutations of endpoints with ranges of walk distance, modes, etc
  • 14:02 <mattwigway> Sounds like andrew's is more sophisticated than mine.
  • 14:02 <andrewbyrd> but we need to run that sort of thing against expected output
  • 14:02 <kpw> i'd love to capture, the time/resource requirements (via novalis_dt's memory monitoring code) plus outcomes for the same trip pairs. which routes changed/can/can't be completed now as a result
  • 14:02 <kpw> et.c
  • 14:03 <kpw> if the CI server could do that automatically it would be great!
  • 14:03 <andrewbyrd> and also make sure the whole batch meets certain requirements - no uncaught exceptions, realistic trip lengths etc
  • 14:03 <andrewbyrd> kpw: I planned to have that kind of tests run at every commit on the CI server
  • 14:03 <kpw> great!
  • 14:04 <kpw> can you leverage novalis_dt's resource monitoring stuff too?
  • 14:04 <kpw> just so we can log the performance impacts of changes
  • 14:04 <novalis_dt> What I did was pretty primitive, but it was probably usable at least for testing
  • 14:04 <andrewbyrd> sure, I need to familiarize myself with it. I guess I will translate these ideas into Java so we can have better access to internal information.
  • 14:04 <kpw> yeah, even just a ballpark is better than not knowing
  • 14:05 <kpw> it would be great to catch stuff that ballons memory/cpu utilization early
  • 14:05 <kpw> esp. as folks move toward productions
  • 14:05 <andrewbyrd> and while we're at it we might as well stuff the performance results into a database or at least a text file so we can compare with past restuls
  • 14:05 <kpw> yep!
  • 14:05 <kpw> if we can log it to the db that would be great
  • 14:06 <kpw> we can compare results
  • 14:06 <kpw> also gives folks the ability to test these impacts automatically (no need to run the test locally, just check in your code and the CI server will run it using reference data/hardware)
  • 14:07 <andrewbyrd> one record per trip, keyed on build number,origin,destination,traverseoptions with memory consumption and itinerary summary included
  • 14:07 <andrewbyrd> that way we can mine out other information later if we want to compare
  • 14:07 <kpw> perfect!
  • 14:08 <kpw> also, let's create test query sets (1000 random trips) so they can be re-run precisely
  • 14:08 <kpw> 1000 was just an example
  • 14:08 <andrewbyrd> frankp: what's your testing procedure like? could you send me a list of endpoints with names?
  • 14:08 <mattwigway> kpw: excellent, I don't have enough RAM for the current integration tests, so I never run them.
  • 14:09 <demory> i think the Trimet folks had to step off at 10
  • 14:09 <kpw> and for resource/performance impacts i think we need to have a better reference platform otherwise it's not possible to make comparisons
  • 14:10 <andrewbyrd> kpw: I was going to do a pseudo-random endpoint set with a density distribution centered on the center of the metropolitan area, and falling off farther out. by specifying the seed value we can get the same set each time.
  • 14:10 <kpw> perfect!
  • 14:10 <andrewbyrd> or different sets as needed.
  • 14:10 <kpw> as long as we can repeat the values that's great
  • 14:10 <novalis_dt> I actually think edge-to-edge times are the more important thing to look at, as those are the ones that tend to give the worst performance
  • 14:11 <andrewbyrd> demory: oh right, I will catch frank by email later.
  • 14:13 <andrewbyrd> novalis_dt: true. I was thinking in terms of measuring real life workload, where there will be drastically more trips in the center. since some optimizations work by lowering the median response time rather than the worst case, I figured getting the trip mix right would be important for estimating performance.
  • 14:13 <andrewbyrd> but of course we need to test the long trips as well.
  • 14:13 <novalis_dt> Maybe two different trip sets
  • 14:13 <andrewbyrd> if the endpoint sets are parametric, we can just do center x outlying or outlying x outlying as needed.
  • 14:14 <andrewbyrd> in different tests (realistic workload vs. try to break OTP tests)
  • 14:14 <kpw> andrewbyrd: can you record distance and time for each trip also? that will be a useful metric for improved routing
  • 14:14 <kpw> (avg trip time changes between built for e.g.)
  • 14:15 <andrewbyrd> kpw: so will we be sticking with one city for these integration tests or several? I ask because I'm trying to figure out how we are
  • 14:16 <andrewbyrd> going to identify trips.
  • 14:16 <novalis_dt> I think we ought to just use Portland
  • 14:16 <andrewbyrd> I suppose the full tuple of SPT request fields is fine, along with a city indication.
  • 14:16 <kpw> yep: let's use a reference data set
  • 14:16 <novalis_dt> With maybe SMART and Cherriots too
  • 14:17 <kpw> can you just hash the request params to create a trip id? but for most reporting i thing we'd take the aggregate for the whole query set
  • 14:17 <andrewbyrd> But it would be better if instead of lat/lon we had foreign keys for the endpoints, which would all be in a table with names.
  • 14:17 <novalis_dt> Well, the request params might change over time
  • 14:18 <kpw> ah, the date maybe. but if we have a historical data set we can always just query using the same time/date range, no?
  • 14:19 <kpw> we could always have multiple data sets too
  • 14:19 <novalis_dt> Well, I meant that we might add new params
  • 14:19 <andrewbyrd> I think it's a good idea to keep every trip result in a database and compute the aggregates from there. Disk space is cheap and we may want to do cross-run profiling on a given trip.
  • 14:19 <kpw> got it
  • 14:19 <kpw> for data, what about portland and nyc for e.g.?
  • 14:20 <novalis_dt> Sure, although we should make sure we get a good build for NYC first.
  • 14:20 <andrewbyrd> yes, one medium city and one metropolis... let's get something working on PDX, keeping multi-city option in mind when designing the schema.
  • 14:20 <kpw> great!
  • 14:21 <kpw> pdx makes sense
  • 14:21 <kpw> then nyc once we have it working
  • 14:22 <andrewbyrd> it would be great if this could communicate with OTP both via the public api and by directly calling code, so it could be reused in different test situations.
  • 14:24 <kpw> such as?
  • 14:26 <andrewbyrd> i mean just the pseudo-random request bit. integration tests / profiling (running in servlet container) vs. unit tests (no servlet, want to assert things about the result internals)
  • 14:29 <demory> Sorry I have a basic question -- where will the integration test code live? Will this be separate from the main OTP repo?
  • 14:30 <andrewbyrd> there's a module for it, maybe some of it would be pulled out into utils/common
  • 14:30 <demory> ok
  • 14:33 <novalis_dt> OK, so let's call this meeting over, and move into the next meeting via phone.
Clone this wiki locally