Escape new lines: some of the text fields include newlines, so we escape \n ->.clean_csv_value: Transforms a single value.copy_from ( csv_file_like_object, 'staging_beers', sep = '|' ) join ( map ( clean_csv_value, ( beer, beer, beer, parse_first_brewed ( beer ), beer, beer, beer, beer, beer, beer, beer, beer, beer, beer, beer, beer, beer, ))) ' \n ' ) csv_file_like_object. StringIO () for beer in beers : csv_file_like_object. cursor () as cursor : create_staging_table ( cursor ) csv_file_like_object = io. replace ( ' \n ', ' \\ n' ) def copy_stringio ( connection, beers : Iterator ]) -> None : with connection. Import io def clean_csv_value ( value : Optional ) -> str : if value is None : return r '\N' return str ( value ). Let's see if we can transform our data into CSV, and load it into the database using copy_from: To use copy from Python, psycopg provides a special function called copy_from. According to the documentation, the best way to load data into a database is using the copy command. The official documentation for PostgreSQL features an entire section on Populating a Database. However, using execute_values we got results ~20% faster compared to the same page size using execute_batch. Here as well, the sweet spot is around page size 1000. Just like execute_batch, we see a tradeoff between memory and speed. > insert_execute_values_iterator ( connection, iter ( beers ), page_size = 1 ) insert_execute_values_iterator(page_size=1) Time 127.4 Memory 0.0 > insert_execute_values_iterator ( connection, iter ( beers ), page_size = 100 ) insert_execute_values_iterator(page_size=100) Time 3.677 Memory 0.0 > insert_execute_values_iterator ( connection, iter ( beers ), page_size = 1000 ) insert_execute_values_iterator(page_size=1000) Time 1.468 Memory 0.0 > insert_execute_values_iterator ( connection, iter ( beers ), page_size = 10000 ) insert_execute_values_iterator(page_size=10000) Time 1.503 Memory 2.25 The function execute_values works by generating a huge VALUES list to the query. While strolling through the documentation, another function called execute_values caught my eye:Įxecute a statement using VALUES with a sequence of parameters. The gems in psycopg's documentation does not end with execute_batch. In this case, it seems that the sweet spot is page size of 1000. The results show that there is a tradeoff between memory and speed. 10000: Timing is not much faster than with a page size of 1000, but the memory is significantly higher.1000: The timing here is about 40% faster, and the memory is low.100: This is the default page_size, so the results are similar to our previous benchmark.
0 Comments
Leave a Reply. |