

## Migrating Peterson Algorithm from SC to RA Memory Order

Computer Systems

Johann Blieberger



Migration of Peterson Algorithm to Memory Order Release-Acquire

> AUTOMATION SYSTEMS GROUP

## **Peterson Algorithm**

Peterson is a simple algorithm for synchronizing two threads in a non-blocking way

Peterson is a simple algorithm for synchronizing two threads in a non-blocking wayfirst software-only solution to synchronization from 1981

- Peterson is a simple algorithm for synchronizing two threads in a non-blocking way
  first software-only solution to synchronization from 1981
- known to be correct for memory order SC

- Peterson is a simple algorithm for synchronizing two threads in a non-blocking way
- first software-only solution to synchronization from 1981
- known to be correct for memory order SC
- named after its inventor Gary L. Peterson

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | А   | 0: i1'READ                | iOr   | А   | 1: iO'READ                |
| 4 | vr    | А   | v'READ                    | vr    | А   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution:

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | A   | 0: i1'READ                | i0r   | A   | 1: iO'READ                |
| 4 | vr    | A   | v'READ                    | vr    | A   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution: (i0, i1, v) =  $(f, f, 0) \xrightarrow{i0t} (\underline{t}, f, 0)$ 

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | А   | 0: i1'READ                | iOr   | A   | 1: iO'READ                |
| 4 | vr    | А   | v'READ                    | vr    | A   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution: (i0, i1, v) =  $(f, f, 0) \xrightarrow{i0t} (\underline{t}, f, 0) \xrightarrow{\vee 0} (t, f, \underline{0})$ 

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | А   | 0: i1'READ                | iOr   | A   | 1: iO'READ                |
| 4 | vr    | А   | v'READ                    | vr    | A   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution: (i0, i1, v) = (f, f, 0)  $\xrightarrow{i0t}$  ( $\underline{t}$ , f, 0)  $\xrightarrow{v0}$  (t, f,  $\underline{0}$ )  $\xrightarrow{i1t}$  (t,  $\underline{t}$ , 0)

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | А   | 0: i1'READ                | iOr   | A   | 1: iO'READ                |
| 4 | vr    | А   | v'READ                    | vr    | A   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution: (i0, i1, v) =  $(f, f, 0) \xrightarrow{i0t} (\underline{t}, f, 0) \xrightarrow{v0} (t, f, \underline{0}) \xrightarrow{i1t} (t, \underline{t}, 0) \xrightarrow{v1} (t, t, \underline{1})$ 

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | А   | 0: i1'READ                | iOr   | A   | 1: iO'READ                |
| 4 | vr    | А   | v'READ                    | vr    | A   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution: (i0, i1, v) =  $(f, f, 0) \xrightarrow{i0t} (\underline{t}, f, 0) \xrightarrow{\vee 0} (t, f, \underline{0}) \xrightarrow{i1t} (t, \underline{t}, 0) \xrightarrow{\vee 1} (t, t, \underline{1}) \xrightarrow{i1r} (t, t, 1)$ 

|   | abbr. | R/A | Thread $P_0$              | abbr. | R/A | Thread $P_1$              |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | A   | 0: i1'READ                | i0r   | A   | 1: iO'READ                |
| 4 | vr    | A   | v'READ                    | vr    | A   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution: (i0, i1, v) =  $(f, f, 0) \xrightarrow{i0t} (\underline{t}, f, 0) \xrightarrow{\vee 0} (t, f, \underline{0}) \xrightarrow{i1t} (t, \underline{t}, 0) \xrightarrow{\vee 1} (t, t, \underline{1}) \xrightarrow{i1r} (t, t, 1) \xrightarrow{\vee r} (t, t, 1)$ 

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | i0t   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | А   | 0: i1'READ                | iOr   | A   | 1: iO'READ                |
| 4 | vr    | А   | v'READ                    | vr    | Α   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution:

 $(i0, i1, v) = (f, f, 0) \xrightarrow{i0t} (\underline{t}, f, 0) \xrightarrow{v0} (t, f, \underline{0}) \xrightarrow{i1t} (t, \underline{t}, 0) \xrightarrow{v1} (t, t, \underline{1}) \xrightarrow{i1r} (t, t, 1) \xrightarrow{vr} (t, t, 1) \xrightarrow{i0r} (t, t, 1) \xrightarrow{vr} (t, 1) \xrightarrow{vr} (t, 1) \xrightarrow{vr} (t, 1) \xrightarrow{vr} (t, 1)$ 

|   | abbr. | R/A | Thread $P_0$              | abbr. | R/A | Thread $P_1$              |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | A   | 0: i1'READ                | i0r   | A   | 1: iO'READ                |
| 4 | vr    | A   | v'READ                    | vr    | A   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution:

 $(i0, i1, v) = (f, f, 0) \xrightarrow{i0t} (\underline{t}, f, 0) \xrightarrow{v0} (t, f, \underline{0}) \xrightarrow{i1t} (t, \underline{t}, 0) \xrightarrow{v1} (t, t, \underline{1}) \xrightarrow{i1r} (t, t, 1) \xrightarrow{vr} (t, t, 1) \xrightarrow{i0r} (t, t, 1) \xrightarrow{vr} (t, t, 1) \xrightarrow{v$ 

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | А   | 0: i1'READ                | iOr   | A   | 1: iO'READ                |
| 4 | vr    | А   | v'READ                    | vr    | A   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution:

 $(i0, i1, v) = (f, f, 0) \xrightarrow{i0t} (\underline{t}, f, 0) \xrightarrow{\vee 0} (t, f, \underline{0}) \xrightarrow{i1t} (t, \underline{t}, 0) \xrightarrow{\vee 1} (t, t, \underline{1}) \xrightarrow{i1r} (t, t, 1) \xrightarrow{\vee r} (t, t, 1) \xrightarrow{i0r} (t, t, 1) \xrightarrow{\vee r} (t, t, 1) \xrightarrow{i1r} (t, t, 1) \xrightarrow{\vee r} (t, t, 1)$ 

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | А   | 0: i1'READ                | iOr   | A   | 1: iO'READ                |
| 4 | vr    | А   | v'READ                    | vr    | A   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution:

 $(i0, i1, v) = (f, f, 0) \xrightarrow{i0t} (\underline{t}, f, 0) \xrightarrow{v0} (t, f, \underline{0}) \xrightarrow{i1t} (t, \underline{t}, 0) \xrightarrow{v1} (t, t, \underline{1}) \xrightarrow{i1r} (t, t, 1) \xrightarrow{vr} (t, t, 1) \xrightarrow{i0r} (t, t, 1) \xrightarrow{i0r} (t, t, 1) \xrightarrow{i1r} (t, t, 1) \xrightarrow{i1$ 

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | А   | 0: i1'READ                | iOr   | A   | 1: iO'READ                |
| 4 | vr    | А   | v'READ                    | vr    | Α   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution:

 $(i0, i1, v) = (f, f, 0) \xrightarrow{i0t} (\underline{t}, f, 0) \xrightarrow{v0} (t, f, \underline{0}) \xrightarrow{i1t} (t, \underline{t}, 0) \xrightarrow{v1} (t, t, \underline{1}) \xrightarrow{i1r} (t, t, 1) \xrightarrow{vr} (t, t, 1) \xrightarrow{i0r} (t, t, 1) \xrightarrow{i0r} (t, t, 1) \xrightarrow{i1r} (t, t, 1) \xrightarrow{i1r} (t, t, 1) \xrightarrow{i1r} (t, t, 1) \xrightarrow{i0r} (t, t, 1) \xrightarrow{i0r} (t, t, 1) \xrightarrow{vr} (t, t, 1) \xrightarrow{i1r} (t, t, 1) \xrightarrow{i1r$ 

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | А   | 0: i1'READ                | iOr   | A   | 1: iO'READ                |
| 4 | vr    | А   | v'READ                    | vr    | Α   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution:

 $(\mathrm{i0},\mathrm{i1},\mathrm{v}) = (f,f,0) \xrightarrow{\mathrm{i0t}} (\underline{t},f,0) \xrightarrow{\mathrm{v0}} (t,f,\underline{0}) \xrightarrow{\mathrm{i1t}} (t,\underline{t},0) \xrightarrow{\mathrm{v1}} (t,t,\underline{1}) \xrightarrow{\mathrm{i1r}} (t,t,1) \xrightarrow{\mathrm{vr}} (t,t,1) \xrightarrow{\mathrm{vr}} (t,t,1) \xrightarrow{\mathrm{iff}} (t,t,1) \xrightarrow{\mathrm{iff}$ 

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | А   | 0: i1'READ                | iOr   | A   | 1: iO'READ                |
| 4 | vr    | А   | v'READ                    | vr    | Α   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution:

 $(\mathrm{i0},\mathrm{i1},\mathrm{v}) = (f,f,0) \xrightarrow{\mathrm{i0t}} (\underline{t},f,0) \xrightarrow{\mathrm{v0}} (t,f,\underline{0}) \xrightarrow{\mathrm{i1t}} (t,\underline{t},0) \xrightarrow{\mathrm{v1}} (t,t,\underline{1}) \xrightarrow{\mathrm{i1r}} (t,t,1) \xrightarrow{\mathrm{vr}} (t,t,1) \xrightarrow{\mathrm{vr}} (t,t,1) \xrightarrow{\mathrm{if1}} (t,t,1)$ 

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | А   | 0: i1'READ                | iOr   | A   | 1: iO'READ                |
| 4 | vr    | А   | v'READ                    | vr    | Α   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution:

 $(i0, i1, v) = (f, f, 0) \xrightarrow{i0t} (\underline{t}, f, 0) \xrightarrow{v0} (t, f, \underline{0}) \xrightarrow{i1t} (t, \underline{t}, 0) \xrightarrow{v1} (t, t, \underline{1}) \xrightarrow{i1r} (t, t, 1) \xrightarrow{vr} (t, t, 1) \xrightarrow{i0r} (t, t, 1) \xrightarrow{i0r} (t, t, 1) \xrightarrow{i0r} (t, t, 1) \xrightarrow{vr} (t, t, 1) \xrightarrow{i0r} (f, t, 1) \xrightarrow{vr} (f, t, 1) \xrightarrow{i0r} (f, t, 1) \xrightarrow{vr} (f, t, 1)$ 

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | А   | 0: i1'READ                | iOr   | A   | 1: iO'READ                |
| 4 | vr    | А   | v'READ                    | vr    | Α   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution:

 $(\mathrm{i0},\mathrm{i1},\mathrm{v}) = (f,f,0) \xrightarrow{\mathrm{i0}t} (\underline{t},f,0) \xrightarrow{\mathrm{v0}} (t,f,\underline{0}) \xrightarrow{\mathrm{i1}t} (t,\underline{t},0) \xrightarrow{\mathrm{v1}} (t,t,\underline{1}) \xrightarrow{\mathrm{i1}r} (t,t,1) \xrightarrow{\mathrm{vr}} (t,t,1) \xrightarrow{\mathrm{i0}r} (t,t,1) \xrightarrow{\mathrm{i0}r} (t,t,1) \xrightarrow{\mathrm{i0}r} (t,t,1) \xrightarrow{\mathrm{i0}r} (t,t,1) \xrightarrow{\mathrm{i1}r} (t,t,1) \xrightarrow{\mathrm{i1}$ 

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | А   | 0: i1'READ                | i0r   | А   | 1: iO'READ                |
| 4 | vr    | А   | v'READ                    | vr    | А   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution:

 $(\mathrm{i0},\mathrm{i1},\mathrm{v}) = (f,f,0) \xrightarrow{\mathrm{i0t}} (\underline{t},f,0) \xrightarrow{\mathrm{v0}} (t,f,\underline{0}) \xrightarrow{\mathrm{i1t}} (t,\underline{t},0) \xrightarrow{\mathrm{v1}} (t,t,\underline{1}) \xrightarrow{\mathrm{i1r}} (t,t,1) \xrightarrow{\mathrm{vr}} (t,t,1) \xrightarrow{\mathrm{vr}} (t,t,1) \xrightarrow{\mathrm{i0t}} (t,t,1) \xrightarrow{\mathrm{i0r}} (t,t,1) \xrightarrow{\mathrm{vr}} (t,t,1) \xrightarrow{\mathrm{i1f}} (t,t,1) \xrightarrow{\mathrm{i1f}}$ 

|   | abbr. | R/A | Thread P <sub>0</sub>     | abbr. | R/A | Thread P <sub>1</sub>     |
|---|-------|-----|---------------------------|-------|-----|---------------------------|
| 1 | iOt   | R   | i0 := true                | i1t   | R   | i1 := true                |
| 2 | v0    | R   | v := 0                    | v1    | R   | v := 1                    |
| 3 | i1r   | А   | 0: i1'READ                | i0r   | А   | 1: iO'READ                |
| 4 | vr    | А   | v'READ                    | vr    | А   | v'READ                    |
| 5 | if0   |     | if i1 and v=0 then goto 0 | if1   |     | if iO and v=1 then goto 1 |
| 6 | s0    | ?   |                           | s1    | ?   |                           |
| 7 | i0f   | R   | iO := false               | i1f   | R   | i1 := false               |

Possible execution:

 $(\mathrm{i0},\mathrm{i1},\mathrm{v}) = (f,f,0) \xrightarrow{\mathrm{i0}t} (\underline{t},f,0) \xrightarrow{\mathrm{v0}} (t,f,\underline{0}) \xrightarrow{\mathrm{i1}t} (t,\underline{t},0) \xrightarrow{\mathrm{v1}} (t,t,\underline{1}) \xrightarrow{\mathrm{i1}r} (t,t,1) \xrightarrow{\mathrm{vr}} (t,t,1) \xrightarrow{\mathrm{i0}t} (t,t,1) \xrightarrow{\mathrm{i0}r} (t,t,1) \xrightarrow{\mathrm{i1}r} (t,t,1) \xrightarrow{\mathrm{i1}$ 

## Interleavings Graph with SC Memory Order



Abbildung: Interleavings Graph - Peterson with SC memory order

• Nodes and edges which cannot be reached in the interleavings graph have been deleted.

- Nodes and edges which cannot be reached in the interleavings graph have been deleted.
- Note that no reordering of statements can occur in a sequentially consistent program.

- Nodes and edges which cannot be reached in the interleavings graph have been deleted.
- Note that no reordering of statements can occur in a sequentially consistent program.
- Because of the structure of the graph, the critical sections s0 and s1 cannot execute at the same time.

- Nodes and edges which cannot be reached in the interleavings graph have been deleted.
- Note that no reordering of statements can occur in a sequentially consistent program.
- Because of the structure of the graph, the critical sections s0 and s1 cannot execute at the same time.
- For this reason, *mutual exclusion* is provided.

- Nodes and edges which cannot be reached in the interleavings graph have been deleted.
- Note that no reordering of statements can occur in a sequentially consistent program.
- Because of the structure of the graph, the critical sections s0 and s1 cannot execute at the same time.
- For this reason, *mutual exclusion* is provided.
- The critical point is the "hole" in the graph near the lower right corner. It ensures correct synchronization.

\_\_\_\_

■ To facilitate understanding, we notice that the nine nodes in the middle can carry both value triples (*t*, *t*, 0) and (*t*, *t*, 1).

- To facilitate understanding, we notice that the nine nodes in the middle can carry both value triples (t, t, 0) and (t, t, 1).
- This means, that there definitely is a *data race* at those nine nodes.

- To facilitate understanding, we notice that the nine nodes in the middle can carry both value triples (*t*, *t*, 0) and (*t*, *t*, 1).
- This means, that there definitely is a *data race* at those nine nodes.
- However, the race does no harm.

- To facilitate understanding, we notice that the nine nodes in the middle can carry both value triples (*t*, *t*, 0) and (*t*, *t*, 1).
- This means, that there definitely is a *data race* at those nine nodes.
- However, the race does no harm.
- Red edges are conditional edges; the corresponding condition [cond] is given as an edge label. If a statement is executed along a conditional edge, the label reads [cond] : stmt.

## Migration of Peterson Algorithm to Memory Order Release-Acquire

 $\overline{x}$  denotes that the effect of statement x ist visible to the "home" core, <u>x</u> denotes that the effect of x is visible to all the other cores.

 $\overline{x}$  denotes that the effect of statement x ist visible to the "home" core, <u>x</u> denotes that the effect of x is visible to all the other cores.

Dependence analysis for thread  $P_0$  results in the constraints

 $\underline{i1r} < \overline{if0}$  and  $\underline{vr} < \overline{if0}$ .

 $\overline{x}$  denotes that the effect of statement x ist visible to the "home" core, <u>x</u> denotes that the effect of x is visible to all the other cores.

Dependence analysis for thread  $P_0$  results in the constraints

 $\underline{i1r} < \overline{if0}$  and  $\underline{vr} < \overline{if0}$ .

In addition, memory fences are derived from release and acquire operations. These are

 $\underline{i0t} < \overline{v0}$ ,  $\underline{if0} < \overline{s0}$ , and  $\underline{i0t} < \overline{i0f}$ .

 $\overline{x}$  denotes that the effect of statement x ist visible to the "home" core,  $\underline{x}$  denotes that the effect of x is visible to all the other cores.

Dependence analysis for thread  $P_0$  results in the constraints

 $\underline{i1r} < \overline{if0}$  and  $\underline{vr} < \overline{if0}$ .

In addition, memory fences are derived from release and acquire operations. These are

 $\frac{\underline{i0t}}{\underline{if0}} < \overline{v0},$  $\frac{\underline{if0}}{\underline{i0t}} < \overline{s0}, \text{ and }$  $\frac{\underline{i0t}}{\underline{i0f}}.$ 

Similar dependencies hold for thread  $P_1$ . In the following we concentrate on thread  $P_0$ . The arguments are virtually the same for thread  $P_1$ .



Abbildung: Peterson with RA Memory order, 1<sup>st</sup> Try (plainly wrong)

■ In this graph we immediately see that *ir*1 can precede <u>i0t</u>.

- In this graph we immediately see that *ir*1 can precede <u>*i*0t</u>.
- A more thorough analysis shows that this may lead to both threads entering the critical section at line 6.

- In this graph we immediately see that *ir*1 can precede <u>i0t</u>.
- A more thorough analysis shows that this may lead to both threads entering the critical section at line 6.
- Thus we need additional memory fences in order to get the algorithm right with RA memory order.

**To** overcome this problem we introduce an additional constraint, namely  $\underline{i0t} < ir1$ .

To overcome this problem we introduce an additional constraint, namely <u>i0t</u> < ir1.</li>
The matrix for this constraint reads

$$\begin{pmatrix} \underline{i0t} & i1r \\ . & i1r \end{pmatrix}$$

because of the loop containing an arbitrary number of i1r statements.



Abbildung: Peterson with RA Memory order, 2<sup>nd</sup> Try (still wrong)

However, the path  $i0t \rightarrow i0t \rightarrow i1t \rightarrow i1t \rightarrow v1 \rightarrow v1 \rightarrow i1r \rightarrow vr \rightarrow v0 \rightarrow v0 \rightarrow i0r \rightarrow vr \rightarrow s0 \rightarrow s1$ in the Kronecker sum of the graph above and its  $P_1$  variant shows that  $P_0$  and  $P_1$  can enter the critical section at the same time.

Thus, constraint  $\underline{i0t} < ir1$  is too weak to ensure correct synchronization.

Our next try is to prohibit vr from preceding *i*0t,

The additional constraint is  $\underline{v0} < i1r$ .

Its matrix reads

$$\begin{pmatrix} \underline{v0} & i1r \\ . & i1r \end{pmatrix}$$

because of the loop containing an arbitrary number of i1r statements.



Abbildung: Peterson with RA Memory order, 3rd Try (correct, but maybe inefficient)

• This version of the algorithm is correct.

- This version of the algorithm is correct.
- This can be proven by applying Kronecker sum to the graph above and its P<sub>1</sub> variant and, further on, removing nodes and edges which cannot be reached in the resulting graph.

- This version of the algorithm is correct.
- This can be proven by applying Kronecker sum to the graph above and its P<sub>1</sub> variant and, further on, removing nodes and edges which cannot be reached in the resulting graph.
- On the other hand, the linear graph does not allow for any instruction reordering.

- This version of the algorithm is correct.
- This can be proven by applying Kronecker sum to the graph above and its P<sub>1</sub> variant and, further on, removing nodes and edges which cannot be reached in the resulting graph.
- On the other hand, the linear graph does not allow for any instruction reordering.
- A short reflection shows that this is too restrictive.

- This version of the algorithm is correct.
- This can be proven by applying Kronecker sum to the graph above and its P<sub>1</sub> variant and, further on, removing nodes and edges which cannot be reached in the resulting graph.
- On the other hand, the linear graph does not allow for any instruction reordering.
- A short reflection shows that this is too restrictive.
- In detail, the constraint  $\underline{v0} < i1r$  makes the constraint  $\underline{i0t} < \overline{v0}$  dispensable.



Abbildung: Peterson with RA Memory order with Relaxed instead of Release Memory Fence (still correct and maybe more efficient)

 Migrating from sequentially consistent memory order to release-acquire memory order ist not at all *straight-forward*.

- Migrating from sequentially consistent memory order to release-acquire memory order ist not at all *straight-forward*.
- It requires deep insights into the code of the algorithm and into the memory models.

- Migrating from sequentially consistent memory order to release-acquire memory order ist not at all *straight-forward*.
- It requires deep insights into the code of the algorithm and into the memory models.
- *Relaxing* is useful to gain performance.

Bartosz Milewski, *Who ordered memory fences on an x86?*, https: //bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/

Bartosz Milewski, *Who ordered sequential consistency?*, https://bartoszmilewski.com/2008/11/11/who-ordered-sequential-consistency/

Bartosz Milewski, *C++ atomics and memory ordering*, https://bartoszmilewski.com/2008/12/01/c-atomics-and-memory-ordering/