1

is my code

import pandas as pd

columns1 = ['Student ID', 'Course ID', 'Marks']
data1 = [(1, 10, 100), (2, 400, 200), (3, 30, 300), (3, 30, 300), (3, 30, 300), (3, 30, 300), (3, 30, 300), (3, 30, 300)]
df1 = pd.DataFrame(data1, columns=columns1)
Student ID Course ID Marks
1 10 100
2 400 200
3 30 300
3 30 300
3 30 300
3 30 300
3 30 300
3 30 300
df1['s']  = np.where((df1['Course ID']  > df1['Marks'])  == True, df1['Student ID'],  df1['s'].shift(1)) 
df1
Student ID Course ID Marks s
1 10 100 NaN
2 400 200 2
3 30 300 2
3 30 300 NaN
3 30 300 NaN
3 30 300 NaN
3 30 300 NaN
3 30 300 NaN

As you can see, only the information of two rows has changed and the rest are null. This is the result I expect because after column 2 condition "df1['Course ID'] > df1['Marks']" is true

Student ID Course ID Marks s
1 10 100 NaN
2 400 200 2
3 30 300 2
3 30 300 2
3 30 300 2
3 30 300 2
3 30 300 2
3 30 300 2

Thank you for your help

3
  • Hi, welcome to SO. With df1['s'].shift(1) you are shifting a column ('s') that does not yet exist, so there is no way that the second image represents the result of running this exact code. You would receive an error. Please update your post with the expected output (as text, not as an image), and explain what logic you are trying to implement. Commented May 19, 2024 at 10:21
  • It's hard to know what you want, and your code doesn't reproduce your output. Commented May 19, 2024 at 10:24
  • Looks like you want df1['s'] = df1['Student ID'].where(df1['Course ID'] > df1['Marks']).ffill(). Chain .astype('Int64') if you want nullable integers. Commented May 19, 2024 at 10:43

1 Answer 1

0

If you want to assign the Student ID for rows matching the df1['Course ID'] > df1['Marks'] condition, and for other rows take the previous value, use ffill:

df1['s'] = (df1['Student ID']
            .where(df1['Course ID'] > df1['Marks'])
            .ffill()
            .convert_dtypes() # optional
           )

Output (with a slightly different input):

   Student ID  Course ID  Marks     s
0           1         10    100  <NA>
1           2        400    200     2
2           3         30    300     2
3           3         30    300     2
4           3        400    300     3
5           3         30    300     3
6           3         30    300     3
7           3         30    300     3

If you only want to apply this logic per Student ID, which might make more sense to avoid "leaking" values from one student to another rather use groupby.ffill:

df1['s'] = (df1['Student ID']
            .where(df1['Course ID'] > df1['Marks'])
            .groupby(df1['Student ID']).ffill()
            .convert_dtypes() # optional
           )

Or:

df1['s'] = (df1['Student ID']
            .where(df1['Course ID'].gt(df1['Marks'])
                   .groupby(df1['Student ID']).cummax())
            .convert_dtypes() # optional
           )

Output:

   Student ID  Course ID  Marks     s
0           1         10    100  <NA>
1           2        400    200     2
2           3         30    300  <NA>
3           3         30    300  <NA>
4           3        400    300     3
5           3         30    300     3
6           3         30    300     3
7           3         30    300     3
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.