Join us in building a kind, collaborative learning community via our updated Code of Conduct.

Questions tagged [pandas]

Pandas is a Python library for Panel Data manipulation and analysis, e.g. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance. IMPORTANT: When asking a question with this tag, please tag your questions: [...

0
votes
0answers
4 views

How to get the index value in a dataframe by comparing date with a datetime object in that dataframe?

I have a dataframe like the following. I would like to get the index value by checking the date. For example if the date is 2018-04-05, I need to get the index value as 3. Can someone let me know how ...
1
vote
0answers
15 views

ValueError when trying to create a DF out of two lists with the same length [duplicate]

I have 2 different lists of lists A and B. print(len(A)) 288 print(len(B)) 288 Making them flat: flat_list = [item for sublist in A for item in sublist] flat_list2 = [item for sublist in B for item ...
0
votes
1answer
35 views

What datetime format is this and how do I parse it?

I have some data that I'm pulling from an API and the date is formatted like this: '1522454400000' Not sure how to parse it but this is what I have (unsuccessfully tried) df = DataFrame(test) df....
0
votes
2answers
21 views

Excel Rows to sentences Python

Let's say I have an Excel file with 5 rows and 2 columns. apples color honeycrsp red gala red goldendel orange fuji red grannys green I want to place each of the rows ...
0
votes
1answer
20 views

How to return different type of objects overloading sum function in python?

I have a class Data and I want to overload the __add__ function and get different type of objects based on the type of the objects I sum. Toy example code The Data class is as follows: class Data(...
0
votes
0answers
16 views

Refering Column with a index in Pandas Jupyter Notebook

I am using python 2.7.13 and pandas in Jupyter notebook. I have the following data https://drive.google.com/file/d/1pko9oRmCllAxipZoa3aoztGZfPAD2iwj/view?usp=sharing which is available on the ...
1
vote
1answer
29 views

how to get the minimum increase in a table Pandas?

I'm trying to get the minimum increase between rows in a column in my table. my attempt so far import pandas as pd df = pd.DataFrame({'A': [0, 100, 50, 100], 'B': [5, 2, 2, 0], ...
0
votes
1answer
24 views

Pandas Counting Character Occurrences

Let's say I have a dataframe that looks like this: df2 = pd.DataFrame(['2018/10/02, 10/2', '02/20/18', '10-31/2018', '1111-0-1000000', '2018/10/11/2019/9999', '10-2, 11/2018/01', '10/2'], columns=['A'...
0
votes
0answers
18 views

df.replace produces “Passing list-likes to .loc or []” warning

I see lots of people getting this error but none for the same reason I am (or at least, doesn't seem to be the same reason!) I'm trying to replace the string '--' in an imported file with a 0. Simple ...
0
votes
0answers
27 views

Pandas - Faster way to split by last \ and use part of string in new column

I created a while loop which seperates the file path from the file/exe column of a pandas dataframe and puts the file path into a new column. #Count rows rows = len(DF1) #While loop to grab file path ...
0
votes
2answers
26 views

Value_counts on multiple columns with groupby

I need some help with Pandas. I have following dataframe: df = pd.DataFrame({'1Country': ['FR', 'FR', 'GER','GER','IT','IT', 'FR','GER','IT'], '2City': ['Paris', 'Paris', 'Berlin', '...
0
votes
1answer
26 views

Arguments of Functions within Functions [on hold]

This is a more broad question instead of a specific problem but I'm finding that I'm writing functions that use other functions that i've previously written and i keep having to pass the previous ...
-1
votes
1answer
32 views

Keep order after melt in pandas

input Data: ╔════╦══════╦══════╦══════╦══════╦══════╗ ║ ID ║ q104 ║ q204 ║ q304 ║ q404 ║ q105 ║ ╠════╬══════╬══════╬══════╬══════╬══════╣ ║ 1 ║ 12 ║ 43 ║ 23 ║ 22 ║ 42 ║ ║ 2 ║ 23 ║ 56 ║...
1
vote
1answer
38 views

Split multiple times?

So I'm currently transferring a txt file into a csv. It's mostly cleaned up, but even after splitting there are still empty columns between some of my data. Below is my messy CSV file And here is ...
2
votes
3answers
20 views

How to set column names with DataFrame.T

I have a data frame that I learned that I can "flip" with df.T but I am wondering how to add the new column names at the same time that I transpose the data frame. My data is like this: dict = {"a":[...
-1
votes
0answers
16 views

Pandas Colored Dataframe not Appearing [duplicate]

I'm fairly new to Python, and I'm running into a few issues with creating a color-coded pandas dataframe. I have my dataframe created and a color coding definition and I run the following code: ...
1
vote
2answers
58 views

How to get average of increasing values using Pandas?

I'm trying to figure out the average of increasing values in my table per column. my table A | B | C ---------------- 0 | 5 | 10 100 | 2 | 20 50 | 2 | 30 100 | 0 | 40 function I'...
0
votes
1answer
19 views

Generate object with data from csv efficiently in python

I have a .csv file with node information (including node_id, x, y), and I try to generate object for each record in .csv file. Now I'm using apply method, but it take almost same running time compared ...
0
votes
2answers
40 views

Python Pandas compare values in multiple columns for partial duplicates and drop record

I need to create a function/expression that compares multiple columns ('Cust ID Count', 'Revenue' and possibly 'Family Name' for a record match and then keeps only the first record based on ascending ...
0
votes
2answers
25 views

Pandas Dataframe Yahoo Finance Checking if Volume Meets Criteria

The program below imports thousands of stock tickers from a .CSV file to a list and passes the tickers as a parameter to a function which pulls the 'Adjusted Close' column of that particular stock and ...
0
votes
1answer
20 views

Python - running into x_test y_test fit errors

I have built a neural network and it worked fine with a small dataset of around 300,000 rows with 2 categorical variables and 1 independent variable, but was running into memory errors when i ...
-1
votes
4answers
43 views

How to check if elements in one array exist in another array if so print the count using Python

I have two arrays A=[1,2,3,4,6,5,5,5,8,9,7,7,7] B=[1,5,7] If elements of B in A then print the number of occurrences output 1:1 5:3 7:3
1
vote
1answer
46 views

Dropping rows in Python using != operator is not working

I want to drop rows in my dataset using: totes = df3.loc[(df3['Reporting Date'] != '18/08/2017') & (df3['Business Line'] != 'Bondy')] However it is not what I expect; I know that the number of ...
-1
votes
0answers
31 views

over 170G RAM usage - Pandas crosstab - purchases data

I have a transactions data with csv columns head: userId, timestamp, event_data, itemId with 171530 rows. I want to make transform it into the form - just an illustration - : In the image, ...
0
votes
2answers
17 views

How to replace the values in a dataframe column based on another dataframe condition

I have two dataframe, XXX and override. XXX = pd.DataFrame({'A':['One', 'Two', 'Three'], 'B': [6,4,3], 'C': ['red','green','blue']}) override = pd.DataFrame({'A':['One','Two'], 'C': ['apple','pie']})...
1
vote
2answers
30 views

How do I calculate moving average with customized weight in pandas?

I have a dataframe than contains two columns, a: [1,2,3,4,5]; b: [1,0.4,0.3,0.5,0.2]. How can I make a column c such that: c[0] = 1 c[i] = c[i-1]*b[i]+a[i]*(1-b[i]) so that c:[1,1.6,2.58,3.29,4....
0
votes
2answers
14 views

Python Pandas: Create New Column With Calculations Based on Categorical Values in A Different Column

I have the following sample data frame: id category time 43 S 8 22 I 10 15 T 350 18 L 46 I want to apply the following logic: 1)...
0
votes
0answers
10 views

Pandas with RegExp Producing Leading and Trailing NAN columns

I have some simple data in a file that I'm reading in with pandas: 2018:08:23:07:35:22:INFO:__main__:Info logger message There are no beginning or trailing tabs, spaces, etc. in the file. I read ...
1
vote
1answer
12 views

For loop for dropping a string pattern from a column name

I am attempting to drop '_Adj' from a column name, in a 'df_merged' data frame if (1) a column name contains 'eTIV' or "eTIV1'. for col in df_merged.columns: if 'eTIV1' in col or 'eTIV' in col: ...
0
votes
1answer
28 views

Reading all excel files from a directory instead of listing them individually

I made a program to merge excel files based on listing their specific file names [4] but if I want merge all files listed in a particular directory (say a folder called test on my desktop) how would I ...
0
votes
3answers
23 views

Reading a file with pandas and use correlation coefficients on two columns

I have a file like following with no header 0.000000 0.330001 0.280120 1.000000 0.355590 0.298581 2.000000 0.305945 0.280231 I want to read this file using pandas dataframe and want to perform ...
0
votes
0answers
29 views

Iterate over dataframe to optimize project management

I have a pandas dataframe that contains the existing relationships between three ids: manager_id, employee_id, project_id. By changing which managers manage which employees, I'd need to find the ...
0
votes
0answers
12 views

Pivoting / Unstacking Text fields in Pandas [duplicate]

Struggling with an unstacking problem. What I am trying to do is similar to the unstack function in jmp, while keeping all of my columns, but I want to do it in Python instead of jsl. I have a table ...
0
votes
1answer
21 views

Math on rows in column of pandas dataframe

I am trying to find code that will allow me to subtract a value in the last row of a column from the value in the second to last value in the same column. Here's what I have tried. df_stock2['...
0
votes
1answer
27 views

Legend Overlapping in graph plotting area

I have a dataframe as below (obtained after lot of preprocessing) Please find dataframe d = {'token': {361: '180816_031', 119: '180816_031', 101: '180816_031', 135: '180816_031', 292: '180816_031', ...
0
votes
1answer
32 views

Pandas find columns with unique values

I have two databases (each with 1000's of tables) which are supposed to reflect the same data but they come from two different sources. I compared two tables to see what the differences were, but to ...
1
vote
3answers
34 views

Pandas - Extract Text from Rows

Let's say I have a dataframe that looks like this: df2 = pd.DataFrame(['Apple, 10/01/2016, 31/10/18, david/kate', 'orange', 'pear', 'Apple', '10/01/2016', '02/20/2017'], columns=['A']) >>> ...
0
votes
0answers
4 views

What other data profiling libraries except pandas_profiling

I an working on to get data profiling on tables in database like redshift or snowflake, are there any other data profiling python libraries except pandas_profiling? Thanks!
0
votes
3answers
30 views

Normalizing rows of pandas dataframe

I need to normalize the rows of a dataframe containing rows populated with all zero. For example: df= pd.DataFrame({"ID": ['1', '2', '3', '4'], "A": [1, 0, 10, 0], "B": [4, 0, 30, 0]}) ID A B 1 ...
0
votes
1answer
19 views

Pandas Datetime AVERAGE

DataFrame where Date is datetime: Column | Date :-----------|----------------------: A | 2018-08-05 17:06:01 A | 2018-08-05 17:06:02 A | ...
0
votes
0answers
17 views

Set an attribute as default for every object of a class

In my script each time a certain type of object is called I have to repeat a certain attribute : if my object df is a dataframe, each time I'm calling df in my script I have to write df.style....
0
votes
0answers
12 views

Complicated Groupby average

I am having trouble doing a groupby mean on this data. I am using this groupby. What I am trying to do is find a vectorized way to average o1 and o2 or the last two columns of the data by the ...
0
votes
0answers
20 views

Pythonic way to write functions for pandas dataframe manipulation

I am doing some data analysis in python using pandas. In the analysis, I am writing a lot of functions that look something like this import pandas as pd def my_func(data): """ A function to ...
0
votes
2answers
19 views

read pandas colum with number values and missing data as string

I have an Id column in my data frame like this: a = pandas.DataFrame([12673, 44, 847]) This data has some missing values. If I Keep_default_NA = True, then the missing value is filled by NaN, and ...
2
votes
1answer
37 views

Pandas query multiindex dataframe based on another single index dataframe

I have two dataframes: Data & Positions. Data has multiindex: 'Date' and 'Symbol'. Positions has a single index: 'Date' (and has a column with the label 'Symbol'). Both 'Date'-s are ...
0
votes
1answer
20 views

pandas - fill in empty row values with other row values conditionally

I have a table that looks like this (the ratio column was merged from another table based on the codename and date): date codename ratio 2018-01-01 A .5 2018-02-01 A ...
0
votes
4answers
29 views

panda aggregate by functions

I have data like below: id movie details value 5 cane11 good 6 5 wind2 ok 30.3 5 wind1 ok 18 5 cane2 good 2 5 cane12 ok 4 5 cane14 good 7 5 wind2 ok 2 I want ...
1
vote
2answers
20 views

Filter dataframe by two columns in Pandas

I have a dataframe A, contains hourly weather data for each city. City Hour Temperature A 1 30 A 2 32 ... B 1 39 B 2 40 I have another dataframe B, which ...
0
votes
0answers
25 views

put values in single columns after pd.DataFrame.from_dict [duplicate]

I have a dataframe created with pd.DataFrame.from_dict the result looks something like this: a b 0 [0.042167, 2.913] [0.042168, 0.245] 1 [0.042164, 1.739]...
-2
votes
0answers
41 views

Python for loop writing to dataframe

I'm pretty new to coding and I have made a for loop that iterates through rows in a 40k row counting dataframe. In the for loop it connects to an API and gets data which it then puts in a dataframe. ...