DatenanalysePower BIPython

More power in Power BI with Python — Part II

– Using Python scripts in Power BI queries –  processing JSON with Python – querying data from the web

Why using Python in queries?

a picture is worth a thousand words — query options in Power BI

The Mission — Fund data from the web

The data then appear in a modal window. It does not matter here at all to understand the meaning of this data. It is simply a matter of percentages that the fund must calculate and publish on each trading day.

Data from web — Power BI fails… (I)

The cause of this failure can be found easily with the help of the Chrome browser tools. See recording below.

JSON data — Power BI fails … (II)
…or it could hardly be more complicated

Now that we are already examining the source code of the page, we can continue and verify our theory about the failure of Power BI above.

This gives us at least two options. Either we simply download the JSON file and then load it into Power BI …

… or we let Power BI retrieve the data for us. As we can see below, this also works!

The result aaData:List is exactly what we saw in the browser console: an object key called “aaData” and an array/list as value …

So how is the JSON file structured?

{
   "aaData":[
      [
         "0,0000",
         "0,4564",
         "17,3647",
         "0,45646",
         {
            "display":"29.05.2017",
            "raw":20171229
         }
      ]
      ...,
      ...,
      ...,
   ]
}
0,0000    0,4564       17,3647       0,45646     29.05.2017
...
...
...

What does not help much …

… but allows us to drill down …

… but this also only shows us a list of lists that we can convert into a table …

By clicking on “Extract values …” we unfortunately run directly into …

…an error.

The result is not an error this time, but each value is now in separate row (in one column) and our date object is kept as a record … We could hardly be further away from our target structure …

We would then extract the rest of the list in the following way.

We could concatenate this new list into a string …

Query with Python

Now a window appears and we can enter our code.

Here the code as text.

1  import pandas as pd
2
3  path = "D:\\downloads\\Performance.json"
4
5  df1 = pd.DataFrame([[*(value[0:4])] + [value[4]["display"]] for value in pd.read_json(path).aaData])
6
7  df1.columns = ["Zwischengewinn", "Aktiengewinn_EStG", "Aktiengewinn_KStG", "Immobiliengewinn", "Datum"]

Everything is wonderfully organized, with headings, just beautiful, right? Just some lines of Python code.

Excursus: Occasional problems

Well, somtimes problems can arise in several places here.

What sometimes leads to strange error messages are errors while reading the JSON file with pd.json_read(). If such errors occur, the following points should be checked:

  • Is the file path correct?
  • Is the JSON in that file syntactically valid? Check it via jsonlint.com.
  • If you still get strange error messages, it is possible that the file did not “arrive” correctly during the download. The only thing that helps in such situation is to copy the content of the file and paste it into a new, empty JSON file.

Another mistake can be made with dates. Here we can see that dates in March were unfortunately not recognized as date.

1  import pandas as pd
2
3  path = "D:\\downloads\\Performance.json"
4
5  df1 = pd.DataFrame([[*(value[0:4])] + [value[4]["display"]] for value in
6  pd.read_json(path, encoding='utf8').aaData])
7
8  df1.columns = ["Zwischengewinn", "Aktiengewinn_EStG", "Aktiengewinn_KStG",
9  "Immobiliengewinn", "Datum"]

I love it when a plan comes together.
-Movie quote for connoisseurs

Queries from the web directly with Python

We are replacing our previous code as follows.

Data from web

And below is the Version to copy and paste the code…

 1  import pandas as pd
 2  import requests
 3  
 4  r = requests.get(
 5      ("https://www.ishares.com/de/privatanleger/"
 6       "de/produkte/258701/ishares-euro-stoxx-50-exfinancials-ucits-etf/"
 7       "1478358465945.ajax?tab=tax-figures&fileType=json")
 8  )
 9   r.encoding = 'utf-8-sig'
10  
11  df1 = pd.DataFrame([[*(value[0:4])] + [value[4]["display"]] for x in
12  r.json().aaData])

That’s all!

The rest stays the same.

Load more funds and make it less hacky

https://www.ishares.com/de/privatanleger/de/produkte/
251861/ishares-msci-europe-ucits-etf-acc-fund/
List = [
     ("Fondsname 1", "/xxxxx/fonds-bezeichnung"),
     ("Fondsname 2", "/xxxxx/fonds-bezeichnung"),
     ...
] 
 
import requests
import pandas as pd

url_start = "https://www.ishares.com/de/privatanleger/de/produkte/"
url_end = "/1478358465945.ajax?tab=tax-figures&fileType=json"
funds = [
    ("iShares EURO STOXX 50 ex-Financials UCITS ETF", "258701/ishares-euro-stoxx-50-exfinancials-ucits-etf"),
    ("iShares Core DAX® UCITS ETF (DE)", "251464/ishares-dax-ucits-etf-de-fund"),
    ("iShares Core MSCI Europe UCITS ETF EUR (Acc)", "251861/ishares-msci-europe-ucits-etf-acc-fund")
]


def fullUrl(fondsSpecificURLPart):
    return url_start + fondsSpecificURLPart + url_end


dataframe = pd.DataFrame(columns=["Zwischengewinn", "Aktiengewinn_EStG",
                                  "Aktiengewinn_KStG", "Immobiliengewinn", "Datum", "Fondsname"]
                         )

for fund in funds:
    fondsName = fund[0]
    response = requests.get(fullUrl(fund[1]))
    response.encoding = 'utf-8-sig'
    dataframe_tmp = pd.DataFrame([[*(x[0:4])] + [x[4]["display"], fondsName] for x in response.json()["aaData"]])
    dataframe_tmp.columns = ["Zwischengewinn", "Aktiengewinn_EStG",
                             "Aktiengewinn_KStG", "Immobiliengewinn", "Datum", "Fondsname"]
    dataframe = dataframe.append(dataframe_tmp)

Perfect! Loaded over 3,500 lines from the web with a script just a few lines long.

Final thoughts

But what I think has become clear: for what and how we can use Python as a query.

Leave a Reply

Your email address will not be published. Required fields are marked *