See if use of vectors can improve performance of listy code #165

barrucadu · 2017-12-10T19:52:26Z

There's quite a bit of very core code in dejafu which is using lists in a known-bad way: lots of appending, iteration, and updates. The findBacktrackSteps function is a representative example:

dejafu/dejafu/Test/DejaFu/SCT/Internal.hs

Lines 225 to 284 in 8c2bf9d

    
           findBacktrackSteps backtrack boundKill = go initialDepState S.empty initialThread [] . F.toList where 
        
             -- Walk through the traces one step at a time, building up a list of 
        
             -- new backtracking points. 
        
             go state allThreads tid bs ((e,i):is) ((d,_,a):ts) = 
        
               let tid' = tidOf tid d 
        
                   state' = updateDepState state tid' a 
        
                   this = BacktrackStep 
        
                     { bcktThreadid   = tid' 
        
                     , bcktDecision   = d 
        
                     , bcktAction     = a 
        
                     , bcktRunnable   = M.fromList e 
        
                     , bcktBacktracks = M.fromList $ map (\i' -> (i', False)) i 
        
                     , bcktState      = state 
        
                     } 
        
                   bs' = doBacktrack killsEarly allThreads' e (bs++[this]) 
        
                   runnable = S.fromList (M.keys $ bcktRunnable this) 
        
                   allThreads' = allThreads `S.union` runnable 
        
                   killsEarly = null ts && boundKill 
        
               in go state' allThreads' tid' bs' is ts 
        
             go _ _ _ bs _ _ = bs 
        
             -- Find the prior actions dependent with this one and add 
        
             -- backtracking points. 
        
             doBacktrack killsEarly allThreads enabledThreads bs = 
        
               let tagged = reverse $ zip [0..] bs 
        
                   idxs   = [ (ehead "doBacktrack.idxs" is, False, u) 
        
                            | (u, n) <- enabledThreads 
        
                            , v <- S.toList allThreads 
        
                            , u /= v 
        
                            , let is = idxs' u n v tagged 
        
                            , not $ null is] 
        
                   idxs' u n v = go' True where 
        
                     {-# INLINE go' #-} 
        
                     go' final ((i,b):rest) 
        
                       -- Don't cross subconcurrency boundaries 
        
                       | isSubC final b = [] 
        
                       -- If this is the final action in the trace and the 
        
                       -- execution was killed due to nothing being within bounds 
        
                       -- (@killsEarly == True@) assume worst-case dependency. 
        
                       | bcktThreadid b == v && (killsEarly || isDependent b) = i : go' False rest 
        
                       | otherwise = go' False rest 
        
                     go' _ [] = [] 
        
                     {-# INLINE isSubC #-} 
        
                     isSubC final b = case bcktAction b of 
        
                       Stop -> not final && bcktThreadid b == initialThread 
        
                       Subconcurrency -> bcktThreadid b == initialThread 
        
                       _ -> False 
        
                     {-# INLINE isDependent #-} 
        
                     isDependent b 
        
                       -- Don't impose a dependency if the other thread will 
        
                       -- immediately block already. This is safe because a 
        
                       -- context switch will occur anyway so there's no point 
        
                       -- pre-empting the action UNLESS the pre-emption would 
        
                       -- possibly allow for a different relaxed memory stage. 
        
                       | isBlock (bcktAction b) && isBarrier (simplifyLookahead n) = False 
        
                       | otherwise = dependent' (bcktState b) (bcktThreadid b) (bcktAction b) u n 
        
               in backtrack bs idxs

In every iteration, it computes bs ++ [this] and reverse $ zip [0..] bs (both bad) and passes this bs list, indirectly, to pBacktrack which computes reverse (take (i-1) bs), and to backtrackAt, which updates-by-index one or two points towards the end of that list. These are all cases which lists are terrible for!

I wonder if using vectors here would be a performance boost. go could be replaced with a list fold over the function parameters to construct an initial bs, followed by a vector map-with-index over that; the list-comprehension-over-reversed-list in doBacktrack would just be a vector fold-right-with-index, as would the recursion in pBacktrack; and backtrackAt would just use vector update.

The text was updated successfully, but these errors were encountered:

barrucadu · 2017-12-10T21:55:44Z

I had a quick go at this, linked is an almost working patch against bd64dfe. There's one test failure:

    Independent Read Independent Write:                 
      :SQ: [Failed]                                                               
Failed:                                                                         
Expected: Right ((0,1),(0,0))                                                                                                         
Expected: Right ((0,1),(0,1))                                                                                                         


      :TSO: [OK]
      :PSO: [OK]

Benchmark results are not that great. Less allocation and less time in the garbage collector, but max resident memory basically the same, and more time in the mutator:

original:

 106,023,792,976 bytes allocated in the heap
  24,972,461,280 bytes copied during GC
      17,829,800 bytes maximum residency (8026 sample(s))
         523,456 bytes maximum slop
              50 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     196429 colls,     0 par   43.323s  43.246s     0.0002s    0.0015s
  Gen  1      8026 colls,     0 par   17.711s  17.706s     0.0022s    0.0406s

  TASKS: 205044 (205040 bound, 4 peak workers (4 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time  142.002s  (134.145s elapsed)
  GC      time   61.035s  ( 60.952s elapsed)
  EXIT    time    0.001s  (  0.001s elapsed)
  Total   time  203.119s  (195.098s elapsed)

  Alloc rate    746,633,705 bytes per MUT second

  Productivity  70.0% of total user, 72.8% of total elapsed

vector:

  89,709,834,032 bytes allocated in the heap
  15,715,802,120 bytes copied during GC   
      17,561,104 bytes maximum residency (3588 sample(s))
         578,704 bytes maximum slop
              45 MB total memory in use (0 MB lost due to fragmentation)
                                    
                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     171402 colls,     0 par   33.321s  33.260s     0.0002s    0.0033s
  Gen  1      3588 colls,     0 par    7.655s   7.651s     0.0021s    0.0451s
                                               
  TASKS: 222854 (222850 bound, 4 peak workers (4 total), using -N1)
                                     
  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
                                       
  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time  160.987s  (152.481s elapsed)
  GC      time   40.976s  ( 40.911s elapsed)
  EXIT    time    0.001s  (  0.001s elapsed)
  Total   time  202.046s  (193.394s elapsed)
                                   
  Alloc rate    557,247,973 bytes per MUT second
                                           
  Productivity  79.7% of total user, 83.3% of total elapsed

Measured by compiling with -rtsopts and running dejafu-tests +RTS -s.

barrucadu added good first issue B: needs investigation T: performance labels Dec 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

See if use of vectors can improve performance of listy code #165

See if use of vectors can improve performance of listy code #165

barrucadu commented Dec 10, 2017

barrucadu commented Dec 10, 2017 •

edited

Loading

See if use of vectors can improve performance of listy code #165

See if use of vectors can improve performance of listy code #165

Comments

barrucadu commented Dec 10, 2017

barrucadu commented Dec 10, 2017 • edited Loading

barrucadu commented Dec 10, 2017 •

edited

Loading