(-it in your command line should actually be: -d it. But that is not the main issue.)
The main issue is that your python environment is not set up properly. I am not an expert on that, but as far as I know, urllib3 and six need to be installed with the same package path. You can check with pip list -v, if they are (which they are probably not in your case). Maybe someone else here can help how to set it straight.
(Actually the instructions in README about using pip3 install -r requirements.txt might not be compatible with this issue between urllib3 and six. Maybe some python experts here can help to provide a more robust way to ensure a correct python environment ).
I have updated requirements.txt so that requests now needs to be 2.32.3. (Not sure if this actually fixes anything, but I had some strange error messages about incompatibility).
I get a running virtual python environment with the following commands on Linux (and after having installed venv):
cd [to the directory of portfolio-classifier.py]
python -m venv ppc-env # ppc-env is the name of the new virtual environment
source ppc-env/bin/activate
pip3 install -r requirements.txt
python portfolio-classifier.py test/multifaktortest.xml # or any other use of the script
[...]
deactivate # to exit the virtual environment
I guess it is slightly different on Windows (especially executing the activate command), but maybe it helps nevertheless.
After that, the following commands are sufficient to run the script:
cd [to the directory of portfolio-classifier.py]
source ppc-env/bin/activate
python portfolio-classifier.py test/multifaktortest.xml # or any other use of the script
[...]
deactivate # to exit the virtual environment
Another note here: I recently upgraded to Ubuntu 24.04.1 LTS. This led to Python environment being managed by debian and I ran into same issues with six and urllib3. If I remember correctly, I resolved it in the following way:
Maybe a general caveat here: The taxonomies for ETFs and other funds are assigned according to their current value. Especially if there is a mix of classifications (different sectors/regions/countries), the individual numbers will typically change over time. However, PP will only have the current values and does not “remember” the previous ones. So any graphs that show e.g. performance per classification over time might be pretty misleading.
Theoretical example: If your portfolio only contains a single ETF which contains two shares, one in sector A and the other in sector B, then any performance graph of your portfolio will show the same performance for sector A and for sector B while in reality the share in sector A might haven fallen dramatically while the share in sector B might have soared (changing the percentage values for sectors A and B over time in the ETF).
So the classification is good to assess the current distribution within your portfolio, but don’t read too much into it when you you look at time series of classifications for portfolios with funds where classifications are changing over time.
I tried the fizban and the Alfons version, but both give me the following error message:
Traceback (most recent call last):
File "/Users/<UserName>/Downloads/pp-portfolio-classifier-main/portfolio-classifier.py", line 732, in <module>
pp_file.add_taxonomy(taxonomy)
File "/Users/<UserName>/Downloads/pp-portfolio-classifier-main/portfolio-classifier.py", line 590, in add_taxonomy
securities = self.get_securities()
File "/Users/<UserName>/Downloads/pp-portfolio-classifier-main/portfolio-classifier.py", line 692, in get_securities
security_h = security.load_holdings()
File "/Users/<UserName>/Downloads/pp-portfolio-classifier-main/portfolio-classifier.py", line 351, in load_holdings
self.holdings.load(isin = self.ISIN, secid = self.secid)
File "/Users/<UserName>/Downloads/pp-portfolio-classifier-main/portfolio-classifier.py", line 419, in load
bearer_token, secid = self.get_bearer_token(secid, domain)
File "/Users/<UserName>/Downloads/pp-portfolio-classifier-main/portfolio-classifier.py", line 396, in get_bearer_token
resultstringtoken = re.findall(token_regex, response.text)[0]
IndexError: list index out of range
It seems there has been a change in the morningstar token or something comparable, but I was not able to fix the problem.
I’ve executed the script five times in a row. The first four gave me errors, the fifth try worked. Very strange behavior. I really do not know, what was happening in the background. Nevertheless, now everything works fine.
Btw, thanks for adding the functionality “auto-classification of stocks”. That was what I was looking for the last two years!
I have created a temporary branch to prepare the switch to a better documented Morningstar API. I already have a version which works well and just requires a bit of code clean-up as there is now a lot of unneeded legacy in the code. But I think it is ready for broader testing.
Would some of you be so kind to try out the new temporary branch and let me know, if there are issues?
The most attractive new feature is flexible retrieval of top holdings. You can decide, if you want 0, 10, 25, 100, 1000 or 3200 holdings per fund/etf. (I recommend not to use values above 100 for top holdings retrieval, if you plan to use the xml in PP in the longer run. The GUI of PP easily gets overloaded with the large numbers and you might not be able to edit the securities anymore. But of course, it doesn’t harm to open a throw-away copy of an xml file with thousands of holdings; and pp_data_fetched.csv will also have them all.).
Please note that I changed scaling from long equity to net equity. So the numbers might slightly deviate from the results of the main branch. Also the different API might provide slightly different values. (BTW: If anybody thinks that long equity is more suitable, please let me know. I am not sure, if I fully understand the implications of this change).
Here is the (temporary) location on GitHub:
I plan to merge this back to my main branch when it is ready.
Cheers.
P.S.: Up to now, the updates only affect funds/etfs. Retrieval for stocks remains unchanged.
Hey, never used it but just gaved it a try and it directly worked fine! Import was smothly running, results looks great! Not that much time actually, but I will definetly give it a try in the next days.
I think I found a bug in the new version; one of my stocks gives an error when I run the script with the -stocks option
@ Retrieving data for stock BMG210901242 (0P0001NVII) using x-ray (de)
[China Water Affairs Group Ltd]
Traceback (most recent call last):
File "/Users/marius/Documents/Entwicklung/git/pp-portfolio-classifier/portfolio-classifier.py", line 1339, in <module>
pp_file.add_taxonomy(taxonomy)
File "/Users/marius/Documents/Entwicklung/git/pp-portfolio-classifier/portfolio-classifier.py", line 1023, in add_taxonomy
securities = self.get_securities()
File "/Users/marius/Documents/Entwicklung/git/pp-portfolio-classifier/portfolio-classifier.py", line 1265, in get_securities
security_h = security.load_holdings()
File "/Users/marius/Documents/Entwicklung/git/pp-portfolio-classifier/portfolio-classifier.py", line 658, in load_holdings
self.holdings.load(isin = self.ISIN, secid = self.secid, name = self.name, isRetired = self.isRetired)
File "/Users/marius/Documents/Entwicklung/git/pp-portfolio-classifier/portfolio-classifier.py", line 932, in load
categories = [taxonomy['map3'][key] for key in categories]
File "/Users/marius/Documents/Entwicklung/git/pp-portfolio-classifier/portfolio-classifier.py", line 932, in <listcomp>
categories = [taxonomy['map3'][key] for key in categories]
KeyError: 'Basic Materials'
I could also reproduce the error in a test file with only the offending stock and Vanguard FTSE All-World. I can provide the file if it helps finding the bug.
Thanks a lot. I didn’t expect impact on stocks. I will check.
UPDATE: This is a general issue with stock retrieval also on the main branch. The problem is that for this stock, the Morningstar page doesn’t provide Stock Style information which confuses the scraping. I will try to fix it on the main branch, but I hope to bring stocks also to the new API which would then be more robust than the current solution.
Another weird behaviour I noticed for the Region classification: All my US Stocks (for example Apple, Microsoft, or even Lockheed Martin) have been classified as “Asia Emerging” instead of the expected “North America”. For ETFs the classification seems correct, the Country classification also seems fine both for ETFs and individual stocks.
Is that an error in the API you’re using?
Oops. It seems that the fix which I provided yesterday for the empty report on BMG210901242 is faulty. Before that, the mapping was correct.
In any case: Retrieval for stocks is not yet on the new API. On the new API, it will be much more robust. So I am not sure, how much I should still invest in the web page scraping solution.
UPDATE: I reverted a few commits. Now it be fine again for normal stocks. I am not perfectly happy with the solution for empty reports as in case of BMG210901242, but they are seldom and now at least they don’t abort the script anymore.
@conbriozy and all: Stocks are now also on the new API (on the temporary branch). Code is much cleaner and even BMG210901242 is now fully retrieved. Looks really good. Please test (with option -stocks).
@Alfons1Qvor12 Thank you for the updated version, unfortunately I still get an Error, apparently when trying to retrieve data for the Solana cryptocurrency. Although I don’t quite understand why the script is even trying to get data for Solana, since it only has a Symbol (SOL) but no ISIN:
[Solana]:
Traceback (most recent call last):
File "/Users/marius/Documents/Entwicklung/git/pp-portfolio-classifier/portfolio-classifier.py", line 1401, in <module>
pp_file.add_taxonomy(taxonomy)
File "/Users/marius/Documents/Entwicklung/git/pp-portfolio-classifier/portfolio-classifier.py", line 1086, in add_taxonomy
securities = self.get_securities()
File "/Users/marius/Documents/Entwicklung/git/pp-portfolio-classifier/portfolio-classifier.py", line 1328, in get_securities
security_h = security.load_holdings()
File "/Users/marius/Documents/Entwicklung/git/pp-portfolio-classifier/portfolio-classifier.py", line 791, in load_holdings
self.holdings.load(isin = self.ISIN, name = self.name, isRetired = self.isRetired)
File "/Users/marius/Documents/Entwicklung/git/pp-portfolio-classifier/portfolio-classifier.py", line 862, in load
url = 'https://www.emea-api.morningstar.com/ecint/v1/securities/' + isin
TypeError: can only concatenate str (not "NoneType") to str
@conbriozy Thanks a lot for finding this. I fixed it.
This was really a tricky one. I don’t know why, but xml parsing in get_security took care of empty isin in the past. But now, this is not always the case anymore (in both main and branch). I fixed it only in branch as the impact in main is minor.
(.find(‘isin’) provides a result, but it does not have .text. Either the PP xml changed or some library for parsing or finding changed, or whatever …)
UPDATE: I found the reason. It is in PP. Empty isin usually results in an xml that does not contain <isin>. However, in some cases when ISIN is somehow touched, PP creates an xml with<isin></isin> which confuses the original python parsing.
I try to use it to classify stocks I have. I pulled main branch, but when I try to run it with -stocks option it fails: portfolio-classifier.py: error: unrecognized arguments: -stocks.
Shall I use different branch?